Solution 3

Contents

Solution 3#

About#

With exercise #1 and exercise #2, you compared two dust forecast models as well as near real-time monitoring with satellite data. From the results of the assignment, you see that it is difficult to know which forecast can be trusted more. Hence, even though model intercomparison is important, model outcomes also have to be compared with real-world observations. Just by comparing model outcomes with measurements from station data, we can better understand how the model performs.

Now, you will focus on how you can use station observations from AERONET to evaluate model forecasts.

Tasks#

1. Brainstorm

  • What observation datasets do you know about?

  • Which variables do they measure?

  • Which data can you use to evaluate model predictions?

2. Download and plot time-series of AERONET data for Santa Cruz, Tenerife

  • Download and visualise AERONET v3.0 measurements of the station Santa Cruz, Tenerife for 21 to 25 February 2020.

  • Some questions to reflect on:

    • Under which name is the station listed in AERONET?

    • What average level would you choose?

    • Which days do we have observations for?

  • Hint

3. Resample AERONET data to a 3-hourly resolution

  • Make use of AERONET index and time columns to create a DateTimeIndex in order to resample the observations to a 3-hourly temporal resolution

  • Hint

    • you have to combine the two columns index and time as one string variable

    • you can use the pandas function to_datetime() to create a DateTimeIndex and the function resample().mean() to resample and average the time-series to a given temporal resolution

  • Question to reflect on

    • How many row entries does the resampled data frame have?

4. Load time-series of the forecasts from CAMS and the MONARCH model and compare it with the AERONET observations

  • Interpret the plotting result.

  • Can you make a statement of the performance of the two forecast models?

  • What is your conclusion regarding AERONET observation data?


Load required libraries

import wget
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import matplotlib.colors
from matplotlib.cm import get_cmap
from matplotlib import animation
from matplotlib.axes import Axes

import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import cartopy.feature as cfeature
from cartopy.mpl.geoaxes import GeoAxes
GeoAxes._pcolormesh_patched = Axes.pcolormesh

import warnings
warnings.simplefilter(action = "ignore", category = RuntimeWarning)

Load helper functions

%run ../../functions.ipynb

1. Brainstorm#

Three different ground-based observations are introduced during the course:

AERONET offers ground-based observations of Aerosol Optical Depth and the Angstrom Exponent and can be used to evaluate dust forecasts.

2. Download and plot time-series of AERONET data#

Define the latitude / longitude values for Santa Cruz, Tenerife. You can see an overview of all available AERONET Site Names here.

lat = 28.473
lon = -16.247

As a first step, let us create a Python dictionary in which we store all the parameters we would like to use for the request as dictionary keys. You can initiate a dictionary with curled brackets {}. Below, we specify the following parameters:

  • endpoint: Endpoint of the AERONET web service

  • station: Name of the AERONET station

  • year: year 1 of interest

  • month: month 1 of interest

  • day: day 1 of interest

  • year2: year 2 of interest

  • month2: month 2 of interest

  • day2: day 2 of interest

  • AOD15: data type, other options include AOD10, AOD20, etc.

  • AVG: data format, AVG=10 - all points, AVG=20 - daily averages

The keywords below are those we will need for requesting all data points (AVG=10) of Aerosol Optical Depth Level 1.5 data for the station Santa Cruz, Tenerife for 21 to 25 February 2020.

data_dict = {
    'endpoint': 'https://aeronet.gsfc.nasa.gov/cgi-bin/print_web_data_v3',
    'station':'Santa_Cruz_Tenerife',
    'year': 2020,
    'month': 2,
    'day': 21,
    'year2': 2020,
    'month2': 2,
    'day2': 25,
    'AOD15': 1,
    'AVG': 10
}

In a next step, we construct the final string for the wget request with the format function. You construct a string by adding the dictionary keys in curled brackets. At the end of the string, you provide the dictionary key informatoin to the string with the format() function. A print of the resulting url shows, that the format function replaced the information in the curled brackets with the data in the dictionary.

url = '{endpoint}?site={station}&year={year}&month={month}&day={day}&year2={year2}&month2={month2}&day2={day2}&AOD15={AOD15}&AVG={AVG}'.format(**data_dict)
url
'https://aeronet.gsfc.nasa.gov/cgi-bin/print_web_data_v3?site=Santa_Cruz_Tenerife&year=2020&month=2&day=21&year2=2020&month2=2&day2=25&AOD15=1&AVG=10'

Now we are ready to request the data with the function download() from the wget Python library. You have to pass to the function the constructed url above together with a file path of where the downloaded that shall be stored. Let us store the data as txt file in the folder ../data/2_observations/aeronet/.

wget.download(url, '../../../../eodata/50_modules/01_dust/04_assignment/aeronet/20200221-25_santa_cruz_tenerife_10.txt')
'../data/aeronet/20200221-25_santa_cruz_tenerife_10 (1).txt'

After we downloaded the station observations as txt file, we can open it with the pandas function read_table(). We additonally set specific keyword arguments that allow us to specify the columns and rows of interest:

  • delimiter: specify the delimiter in the text file, e.g. comma

  • header: specify the index of the row that shall be set as header.

  • index_col: specify the index of the column that shall be set as index

You see below that the resulting dataframe has 292 rows and 113 columns.

df = pd.read_table('../../../../eodata/50_modules/01_dust/04_assignment/aeronet/20200221-25_santa_cruz_tenerife_10.txt', delimiter=',', header=[7], index_col=1)
df
AERONET_Site Time(hh:mm:ss) Day_of_Year Day_of_Year(Fraction) AOD_1640nm AOD_1020nm AOD_870nm AOD_865nm AOD_779nm AOD_675nm ... Exact_Wavelengths_of_AOD(um)_380nm Exact_Wavelengths_of_AOD(um)_340nm Exact_Wavelengths_of_PW(um)_935nm Exact_Wavelengths_of_AOD(um)_681nm Exact_Wavelengths_of_AOD(um)_709nm Exact_Wavelengths_of_AOD(um)_Empty Exact_Wavelengths_of_AOD(um)_Empty.1 Exact_Wavelengths_of_AOD(um)_Empty.2 Exact_Wavelengths_of_AOD(um)_Empty.3 Exact_Wavelengths_of_AOD(um)_Empty<br>
Date(dd:mm:yyyy)
21:02:2020 Santa_Cruz_Tenerife 08:22:46 52.0 52.349144 0.021395 0.031387 0.034429 -999.0 -999.0 0.040873 ... 0.3797 0.3398 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
21:02:2020 Santa_Cruz_Tenerife 08:25:22 52.0 52.350949 0.019840 0.028373 0.030877 -999.0 -999.0 0.035930 ... 0.3797 0.3398 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
21:02:2020 Santa_Cruz_Tenerife 08:28:46 52.0 52.353310 0.019732 0.028353 0.030869 -999.0 -999.0 0.035813 ... 0.3797 0.3398 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
21:02:2020 Santa_Cruz_Tenerife 08:30:39 52.0 52.354618 0.019704 0.028210 0.030609 -999.0 -999.0 0.035490 ... 0.3797 0.3398 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
21:02:2020 Santa_Cruz_Tenerife 08:33:09 52.0 52.356354 0.020827 0.030963 0.030745 -999.0 -999.0 0.035596 ... 0.3797 0.3398 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
25:02:2020 Santa_Cruz_Tenerife 18:09:31 56.0 56.756609 0.266426 0.359000 0.381361 -999.0 -999.0 0.412116 ... 0.3797 0.3398 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
25:02:2020 Santa_Cruz_Tenerife 18:13:14 56.0 56.759190 0.279673 0.374916 0.398722 -999.0 -999.0 0.430402 ... 0.3797 0.3398 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
25:02:2020 Santa_Cruz_Tenerife 18:15:09 56.0 56.760521 0.286610 0.384756 0.408843 -999.0 -999.0 0.440872 ... 0.3797 -999.0000 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
25:02:2020 Santa_Cruz_Tenerife 18:17:13 56.0 56.761956 0.293157 0.393598 0.418218 -999.0 -999.0 0.451127 ... 0.3797 -999.0000 0.9371 -999.0 -999.0 -999.0 -999.0 -999.0 -999.0 -999.<br>
NaN </body></html> NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

292 rows × 113 columns


Now, we can inspect the entries in the loaded data frame a bit more. Above you see that the last entry is a NaN entry, which is best to drop with the function dropna().

The next step is then to replace the entries with -999.0 and set them as NaN. We can use the function replace() to do so.

df = df.dropna()
df = df.replace(-999.0, np.nan)
df
AERONET_Site Time(hh:mm:ss) Day_of_Year Day_of_Year(Fraction) AOD_1640nm AOD_1020nm AOD_870nm AOD_865nm AOD_779nm AOD_675nm ... Exact_Wavelengths_of_AOD(um)_380nm Exact_Wavelengths_of_AOD(um)_340nm Exact_Wavelengths_of_PW(um)_935nm Exact_Wavelengths_of_AOD(um)_681nm Exact_Wavelengths_of_AOD(um)_709nm Exact_Wavelengths_of_AOD(um)_Empty Exact_Wavelengths_of_AOD(um)_Empty.1 Exact_Wavelengths_of_AOD(um)_Empty.2 Exact_Wavelengths_of_AOD(um)_Empty.3 Exact_Wavelengths_of_AOD(um)_Empty<br>
Date(dd:mm:yyyy)
21:02:2020 Santa_Cruz_Tenerife 08:22:46 52.0 52.349144 0.021395 0.031387 0.034429 NaN NaN 0.040873 ... 0.3797 0.3398 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
21:02:2020 Santa_Cruz_Tenerife 08:25:22 52.0 52.350949 0.019840 0.028373 0.030877 NaN NaN 0.035930 ... 0.3797 0.3398 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
21:02:2020 Santa_Cruz_Tenerife 08:28:46 52.0 52.353310 0.019732 0.028353 0.030869 NaN NaN 0.035813 ... 0.3797 0.3398 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
21:02:2020 Santa_Cruz_Tenerife 08:30:39 52.0 52.354618 0.019704 0.028210 0.030609 NaN NaN 0.035490 ... 0.3797 0.3398 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
21:02:2020 Santa_Cruz_Tenerife 08:33:09 52.0 52.356354 0.020827 0.030963 0.030745 NaN NaN 0.035596 ... 0.3797 0.3398 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
25:02:2020 Santa_Cruz_Tenerife 18:06:09 56.0 56.754271 0.262337 0.354162 0.376787 NaN NaN 0.407501 ... 0.3797 0.3398 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
25:02:2020 Santa_Cruz_Tenerife 18:09:31 56.0 56.756609 0.266426 0.359000 0.381361 NaN NaN 0.412116 ... 0.3797 0.3398 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
25:02:2020 Santa_Cruz_Tenerife 18:13:14 56.0 56.759190 0.279673 0.374916 0.398722 NaN NaN 0.430402 ... 0.3797 0.3398 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
25:02:2020 Santa_Cruz_Tenerife 18:15:09 56.0 56.760521 0.286610 0.384756 0.408843 NaN NaN 0.440872 ... 0.3797 NaN 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>
25:02:2020 Santa_Cruz_Tenerife 18:17:13 56.0 56.761956 0.293157 0.393598 0.418218 NaN NaN 0.451127 ... 0.3797 NaN 0.9371 NaN NaN NaN NaN NaN NaN -999.<br>

291 rows × 113 columns


We can now plot the column AOD_500nm as time-series. What is striking in the resulting plot?

# Initiate a matplotlib figure
fig = plt.figure(figsize=(20,10))
ax=plt.axes()

# Select pandas dataframe columns and define a line plot
df.filter(['AOD_500nm']).plot(ax=ax, style='o-', color='green' )


# Set title and axes lable information
plt.title('\nAerosol Optical Depth Level 1.5 data for 21-25 February 2021 - Santa Cruz, Tenerife', fontsize=20, pad=20)
plt.ylabel('Aerosol Optical Depth\n', fontsize=16)
plt.xlabel('\nDay', fontsize=16)

# Format the axes ticks
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)

# Add additionally a legend and grid to the plot
plt.legend(fontsize=16,loc=0)
plt.grid()
../_images/4ffde829a999c369bda8da477c20d58f8ee99f6d840e9f58dc3e680d34e4ea8f.png

3. Resample AERONET data to a 3-hourly resolution#

Above you see that it seems we do not have observations for 23rd and 24th February 2020 for the Santa Cruz station in Tenerife. A closer inspection of the index entry confirms that we only have measurements for 21st, 22nd and 25th.

df.index.values
array(['21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '21:02:2020', '21:02:2020', '21:02:2020',
       '21:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '22:02:2020', '22:02:2020',
       '22:02:2020', '22:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020', '25:02:2020',
       '25:02:2020', '25:02:2020', '25:02:2020'], dtype=object)

From the dataframe above, let us only select the columns of interest for us. This makes the handling of the dataframe much easier. The columns of interest are: Time(hh:mm:ss) and AOD_500nm and you can use the function filter() to select specific columns.

aeronet_ts = df.filter(['Time(hh:mm:ss)','AOD_500nm'])
aeronet_ts
Time(hh:mm:ss) AOD_500nm
Date(dd:mm:yyyy)
21:02:2020 08:22:46 0.053515
21:02:2020 08:25:22 0.046500
21:02:2020 08:28:46 0.045808
21:02:2020 08:30:39 0.045296
21:02:2020 08:33:09 0.045245
... ... ...
25:02:2020 18:06:09 0.436686
25:02:2020 18:09:31 0.441193
25:02:2020 18:13:14 0.460659
25:02:2020 18:15:09 0.471839
25:02:2020 18:17:13 0.482647

291 rows × 2 columns

The next step is to create an index as DateTimeIndex. For this, we want to first combine the index entries (date) and time stamp entries of the column Time(hh:mm:ss). We can do this simply by redefining the data frame’s index and combining both with adding the two columns together as a string.

In a second step, we then convert the newly created index entry to a DateTimeIndex format with the function to_datetime(). Important here, you have to specify the format of combined index string.

aeronet_ts.index = aeronet_ts.index + ' ' + aeronet_ts['Time(hh:mm:ss)']
aeronet_ts.index = pd.to_datetime(aeronet_ts.index, format = '%d:%m:%Y %H:%M:%S')
aeronet_ts
Time(hh:mm:ss) AOD_500nm
2020-02-21 08:22:46 08:22:46 0.053515
2020-02-21 08:25:22 08:25:22 0.046500
2020-02-21 08:28:46 08:28:46 0.045808
2020-02-21 08:30:39 08:30:39 0.045296
2020-02-21 08:33:09 08:33:09 0.045245
... ... ...
2020-02-25 18:06:09 18:06:09 0.436686
2020-02-25 18:09:31 18:09:31 0.441193
2020-02-25 18:13:14 18:13:14 0.460659
2020-02-25 18:15:09 18:15:09 0.471839
2020-02-25 18:17:13 18:17:13 0.482647

291 rows × 2 columns


The MONARCH model has a three hourly resolution. For this reason, we have to adjust the temporal resolution of the AERONET observations and resample the data frame to a three hourly temporal resolution. Below, we use the function resample() and create the average of every 3-hours. The resulting dataframe has 37 row entries.

aeronet_ts_resample = aeronet_ts.resample('3H').mean()
aeronet_ts_resample
AOD_500nm
2020-02-21 06:00:00 0.047250
2020-02-21 09:00:00 0.054734
2020-02-21 12:00:00 0.061003
2020-02-21 15:00:00 0.092615
2020-02-21 18:00:00 NaN
2020-02-21 21:00:00 NaN
2020-02-22 00:00:00 NaN
2020-02-22 03:00:00 NaN
2020-02-22 06:00:00 0.117495
2020-02-22 09:00:00 0.147488
2020-02-22 12:00:00 0.253696
2020-02-22 15:00:00 0.375537
2020-02-22 18:00:00 NaN
2020-02-22 21:00:00 NaN
2020-02-23 00:00:00 NaN
2020-02-23 03:00:00 NaN
2020-02-23 06:00:00 NaN
2020-02-23 09:00:00 NaN
2020-02-23 12:00:00 NaN
2020-02-23 15:00:00 NaN
2020-02-23 18:00:00 NaN
2020-02-23 21:00:00 NaN
2020-02-24 00:00:00 NaN
2020-02-24 03:00:00 NaN
2020-02-24 06:00:00 NaN
2020-02-24 09:00:00 NaN
2020-02-24 12:00:00 NaN
2020-02-24 15:00:00 NaN
2020-02-24 18:00:00 NaN
2020-02-24 21:00:00 NaN
2020-02-25 00:00:00 NaN
2020-02-25 03:00:00 NaN
2020-02-25 06:00:00 0.674464
2020-02-25 09:00:00 0.581988
2020-02-25 12:00:00 0.572472
2020-02-25 15:00:00 0.407499
2020-02-25 18:00:00 0.450671

4. Load time series of the forecasts from CAMS and the MONARCH model#

We can now load the time-series information of the CAMS and the MONARCH model forecasts for Santa Cruz, Tenerife. The time-series has been saved as csv file from yesterday’s assignment (see here). The time-series is available under ../data/sdswas_ts.csv and can be loaded as pandas.DataFrame with the function read_csv().

The next step is then to convert the index of the loaded dataframe into a DateTimeIndex format, as this allows us to combine the different time-series.

sdswas_ts = pd.read_csv('../../eodata/case_study/sdswas_ts.csv')
sdswas_ts.index = sdswas_ts['time']
sdswas_ts.index = pd.to_datetime(sdswas_ts.index, format = '%Y-%m-%d %H:%M:%S')
sdswas_ts
time lon lat od550_dust
time
2020-02-21 12:00:00 2020-02-21 12:00:00 -16.333335 28.33333 0.000058
2020-02-21 15:00:00 2020-02-21 15:00:00 -16.333335 28.33333 0.000019
2020-02-21 18:00:00 2020-02-21 18:00:00 -16.333335 28.33333 0.000027
2020-02-21 21:00:00 2020-02-21 21:00:00 -16.333335 28.33333 0.000207
2020-02-22 00:00:00 2020-02-22 00:00:00 -16.333335 28.33333 0.000890
2020-02-22 03:00:00 2020-02-22 03:00:00 -16.333335 28.33333 0.001751
2020-02-22 06:00:00 2020-02-22 06:00:00 -16.333335 28.33333 0.009111
2020-02-22 09:00:00 2020-02-22 09:00:00 -16.333335 28.33333 0.050932
2020-02-22 12:00:00 2020-02-22 12:00:00 -16.333335 28.33333 0.203418
2020-02-22 15:00:00 2020-02-22 15:00:00 -16.333335 28.33333 0.363704
2020-02-22 18:00:00 2020-02-22 18:00:00 -16.333335 28.33333 0.433835
2020-02-22 21:00:00 2020-02-22 21:00:00 -16.333335 28.33333 1.095499
2020-02-23 00:00:00 2020-02-23 00:00:00 -16.333335 28.33333 2.165373
2020-02-23 03:00:00 2020-02-23 03:00:00 -16.333335 28.33333 2.052836
2020-02-23 06:00:00 2020-02-23 06:00:00 -16.333335 28.33333 0.861120
2020-02-23 09:00:00 2020-02-23 09:00:00 -16.333335 28.33333 0.753394
2020-02-23 12:00:00 2020-02-23 12:00:00 -16.333335 28.33333 0.466931
2020-02-23 15:00:00 2020-02-23 15:00:00 -16.333335 28.33333 0.354274
2020-02-23 18:00:00 2020-02-23 18:00:00 -16.333335 28.33333 0.320627
2020-02-23 21:00:00 2020-02-23 21:00:00 -16.333335 28.33333 0.231225
2020-02-24 00:00:00 2020-02-24 00:00:00 -16.333335 28.33333 0.179565
2020-02-24 03:00:00 2020-02-24 03:00:00 -16.333335 28.33333 0.221447
2020-02-24 06:00:00 2020-02-24 06:00:00 -16.333335 28.33333 0.477572
2020-02-24 09:00:00 2020-02-24 09:00:00 -16.333335 28.33333 0.700524
2020-02-24 12:00:00 2020-02-24 12:00:00 -16.333335 28.33333 1.151539

We repeat the same process for the time-series of the CAMS forecasts. We first read the csv file under ../data/cams_ts.csv with the function read_csv(). Then, we convert the index into a DateTimeIndex format.

cams_ts = pd.read_csv('../../eodata/case_study/cams_ts.csv')
cams_ts.index = cams_ts['time']
cams_ts.index = pd.to_datetime(cams_ts.index, format='%Y-%m-%d %H:%M:%S')
cams_ts
time longitude latitude duaod550
time
2020-02-21 12:00:00 2020-02-21 12:00:00 -16.4 28.6 0.000568
2020-02-21 15:00:00 2020-02-21 15:00:00 -16.4 28.6 0.000568
2020-02-21 18:00:00 2020-02-21 18:00:00 -16.4 28.6 0.001460
2020-02-21 21:00:00 2020-02-21 21:00:00 -16.4 28.6 0.002422
2020-02-22 00:00:00 2020-02-22 00:00:00 -16.4 28.6 0.005306
2020-02-22 03:00:00 2020-02-22 03:00:00 -16.4 28.6 0.011280
2020-02-22 06:00:00 2020-02-22 06:00:00 -16.4 28.6 0.080772
2020-02-22 09:00:00 2020-02-22 09:00:00 -16.4 28.6 0.203139
2020-02-22 12:00:00 2020-02-22 12:00:00 -16.4 28.6 0.335806
2020-02-22 15:00:00 2020-02-22 15:00:00 -16.4 28.6 0.501777
2020-02-22 18:00:00 2020-02-22 18:00:00 -16.4 28.6 0.581158
2020-02-22 21:00:00 2020-02-22 21:00:00 -16.4 28.6 0.755507
2020-02-23 00:00:00 2020-02-23 00:00:00 -16.4 28.6 1.102213
2020-02-23 03:00:00 2020-02-23 03:00:00 -16.4 28.6 1.210434
2020-02-23 06:00:00 2020-02-23 06:00:00 -16.4 28.6 1.024824
2020-02-23 09:00:00 2020-02-23 09:00:00 -16.4 28.6 0.905959
2020-02-23 12:00:00 2020-02-23 12:00:00 -16.4 28.6 0.777000
2020-02-23 15:00:00 2020-02-23 15:00:00 -16.4 28.6 0.603338
2020-02-23 18:00:00 2020-02-23 18:00:00 -16.4 28.6 0.545107
2020-02-23 21:00:00 2020-02-23 21:00:00 -16.4 28.6 0.492370
2020-02-24 00:00:00 2020-02-24 00:00:00 -16.4 28.6 0.364715
2020-02-24 03:00:00 2020-02-24 03:00:00 -16.4 28.6 0.281077
2020-02-24 06:00:00 2020-02-24 06:00:00 -16.4 28.6 0.170247
2020-02-24 09:00:00 2020-02-24 09:00:00 -16.4 28.6 0.219139
2020-02-24 12:00:00 2020-02-24 12:00:00 -16.4 28.6 0.355102
2020-02-24 15:00:00 2020-02-24 15:00:00 -16.4 28.6 0.515648
2020-02-24 18:00:00 2020-02-24 18:00:00 -16.4 28.6 0.622153
2020-02-24 21:00:00 2020-02-24 21:00:00 -16.4 28.6 0.886938
2020-02-25 00:00:00 2020-02-25 00:00:00 -16.4 28.6 1.008824
2020-02-25 03:00:00 2020-02-25 03:00:00 -16.4 28.6 0.765258
2020-02-25 06:00:00 2020-02-25 06:00:00 -16.4 28.6 0.606290

5. Visually compare model forecasts with AERONET observations#

All the three time-series (aeronet_ts, sdswas_ts and cams_ts) have now the same temporal resolution. The DateTimeIndex format allows for an efficient handling of time-series information and also allows us to combine the three time-series into one pandas.DataFrame. We can use the function join() and combine the two data frames aeronet_ts and sdswas_ts. We can repeat the process to join the third column from the cams_ts.

The resulting dataframe has 37 row entries and three columns.

df_merged = aeronet_ts_resample.join(sdswas_ts['od550_dust'])
df_merged = df_merged.join(cams_ts['duaod550'])
df_merged
AOD_500nm od550_dust duaod550
2020-02-21 06:00:00 0.047250 NaN NaN
2020-02-21 09:00:00 0.054734 NaN NaN
2020-02-21 12:00:00 0.061003 0.000058 0.000568
2020-02-21 15:00:00 0.092615 0.000019 0.000568
2020-02-21 18:00:00 NaN 0.000027 0.001460
2020-02-21 21:00:00 NaN 0.000207 0.002422
2020-02-22 00:00:00 NaN 0.000890 0.005306
2020-02-22 03:00:00 NaN 0.001751 0.011280
2020-02-22 06:00:00 0.117495 0.009111 0.080772
2020-02-22 09:00:00 0.147488 0.050932 0.203139
2020-02-22 12:00:00 0.253696 0.203418 0.335806
2020-02-22 15:00:00 0.375537 0.363704 0.501777
2020-02-22 18:00:00 NaN 0.433835 0.581158
2020-02-22 21:00:00 NaN 1.095499 0.755507
2020-02-23 00:00:00 NaN 2.165373 1.102213
2020-02-23 03:00:00 NaN 2.052836 1.210434
2020-02-23 06:00:00 NaN 0.861120 1.024824
2020-02-23 09:00:00 NaN 0.753394 0.905959
2020-02-23 12:00:00 NaN 0.466931 0.777000
2020-02-23 15:00:00 NaN 0.354274 0.603338
2020-02-23 18:00:00 NaN 0.320627 0.545107
2020-02-23 21:00:00 NaN 0.231225 0.492370
2020-02-24 00:00:00 NaN 0.179565 0.364715
2020-02-24 03:00:00 NaN 0.221447 0.281077
2020-02-24 06:00:00 NaN 0.477572 0.170247
2020-02-24 09:00:00 NaN 0.700524 0.219139
2020-02-24 12:00:00 NaN 1.151539 0.355102
2020-02-24 15:00:00 NaN NaN 0.515648
2020-02-24 18:00:00 NaN NaN 0.622153
2020-02-24 21:00:00 NaN NaN 0.886938
2020-02-25 00:00:00 NaN NaN 1.008824
2020-02-25 03:00:00 NaN NaN 0.765258
2020-02-25 06:00:00 0.674464 NaN 0.606290
2020-02-25 09:00:00 0.581988 NaN NaN
2020-02-25 12:00:00 0.572472 NaN NaN
2020-02-25 15:00:00 0.407499 NaN NaN
2020-02-25 18:00:00 0.450671 NaN NaN

The last step is now to plot the three columns of the pandas.DataFrame df_merged as three individual line plots.

# Initiate a figure
fig = plt.figure(figsize=(15,8))
ax = plt.subplot()

# Define the plotting function
ax.plot(df_merged.od550_dust, 'o-', color='darkred',  label='SDS-WAS dust forecast - 550 nm')
ax.plot(df_merged.duaod550, 'o-', color='blue', label='CAMS dust forecast - 550 nm')
ax.plot(df_merged.AOD_500nm, 'o-', color='green', label='AERONET observations - 500 nm')

# Customize the title and axes lables
ax.set_title('\nAerosol Optical Depth at 500 / 550 nm - Santa Cruz, Tenerife\n', fontsize=20)
ax.set_ylabel('~', fontsize=14)
ax.set_xlabel('\nHour', fontsize=14)

# Customize the fontsize of the axes tickes
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

# Add a gridline to the plot
ax.grid(linestyle='--')

plt.legend(fontsize=14, loc=2)
<matplotlib.legend.Legend at 0x7f12a26f02b0>
../_images/bee76cf8d7f0c8fa2c95c64f93a8641efb91b9e70524e635a0dc7f6ebb7b05bd.png

The plot above shows you that AERONET station observations can be used to evaluate model predictions - however, station observations show often also quite a few gaps. The strong dust event over Tenerife on 23/24 February 2021 was not measured at all by the AERONET station. For such gaps, model data are helpful to bridge such gaps. But the plot above shows also that the two models predicted especially the intensity of the dust event differently.