CAMS European Air Quality Forecasts and Analyses#

This notebooks provides an introduction to the CAMS European Air Quality Forecasts dataset, which provides daily air quality analyses and forecasts over Europe.

CAMS produces specific daily air quality analyses and forecasts for the European domain at significantly higher spatial resolution (0.1 degrees, approx. 10km) than is available from the global analyses and forecasts. The production is based on an ensemble of nine air quality forecasting systems across Europe. A median ensemble is calculated from individual outputs, since ensemble products yield on average better performance than the individual model products. Air quality forecasts are produced once a day for the next four days. Both the analysis and the forecast are available at hourly time steps at seven height levels.

The notebook examines the Saharan Dust event which occured over parts of Europe in the first half of February 2021.

Basic facts

Spatial resolution: 0.1° x 0.1°
Spatial coverage: Europe
Temporal resolution: 1-hourly up to leadtime hour 120
Temporal coverage: three-year rolling archive
Data format: GRIB or zipped NetCDF

How to access the data

CAMS European air quality forecasts are available for download via the Copernicus Atmosphere Data Store (ADS). You will need to create an ADS account here.

Data from the ADS can be downloaded in two ways:


Load required libraries

import xarray as xr
import pandas as pd
from datetime import datetime

import matplotlib.pyplot as plt
import matplotlib.colors
from matplotlib.cm import get_cmap
from matplotlib import animation
from matplotlib.axes import Axes

import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import cartopy.feature as cfeature
from cartopy.mpl.geoaxes import GeoAxes
GeoAxes._pcolormesh_patched = Axes.pcolormesh

from IPython.display import HTML

import warnings
warnings.simplefilter(action = "ignore", category = RuntimeWarning)

Load helper functions

%run ../functions.ipynb

Optional: Retrieve CAMS European air quality forecasts programmatically#

The CDS Application Program Interface (CDS API) is a Python library which allows you to access data from the ADS programmatically. In order to use the CDS API, follow the steps below:

  • Self-register at the ADS registration page (if you do not have an account yet)

  • Login to the ADS portal and go to the api-how-to-page.

  • Copy the CDS API key displayed in the black terminal window and replace the ###### of the KEY variable below with your individual CDS API key

Note: You find your CDS API key displayed in the black terminal box under the section Install the CDS API key. If you do not see a URL or key appear in the black terminal box, please refresh your browser tab.

URL='https://ads.atmosphere.copernicus.eu/api/v2'
KEY='########################'

The next step is then to request the data with a so called API request. Via the ADS web interface, you can select the data and at the end of the web interface, you can open the ADS request via Show API request. Copy the request displayed there in the cell below. Once you execute the cell, the download of the data starts automatically.

In the web interface we specified the following characteristics:

  • Variable: Dust

  • Date: 2021-02-04 to 2021-02-04

  • time: 00:00 (Start of the model run)

  • Lead time hour: every 3 hours up to forecast hour 90

  • Level: 0 (metres above sea level)

  • Format: netcdf_zip

import cdsapi

c = cdsapi.Client(url=URL, key=KEY)

c.retrieve(
    'cams-europe-air-quality-forecasts',
    {
        'model': 'ensemble',
        'date': '2021-02-04/2021-02-04',
        'format': 'netcdf',
        'variable': 'dust',
        'type': 'forecast',
        'time': '00:00',
        'leadtime_hour': [
            '0', '12', '15',
            '18', '21', '24',
            '27', '3', '30',
            '33', '36', '39',
            '42', '45', '48',
            '51', '54', '57',
            '6', '60', '63',
            '66', '69', '72',
            '75', '78', '81',
            '84', '87', '9',
            '90',
        ],
        'level': '0',
    },
    '../20210204_dust_concentration.nc')

Load and browse CAMS European Air Quality forecast data#

CAMS European air quality forecasts can be downloaded in either GRIB or NetCDF. The data for the present example has been downloaded in the netCDF format. You can use xarray’s function xr.open_dataset() to open the NetCDF file as xarray.Dataset.

file = xr.open_dataset('../eodata/3_model/cams/european_forecast/20210204_dust_concentration.nc')
file
<xarray.Dataset>
Dimensions:    (longitude: 700, latitude: 420, level: 1, time: 31)
Coordinates:
  * longitude  (longitude) float32 335.0 335.1 335.2 335.4 ... 44.75 44.85 44.95
  * latitude   (latitude) float32 71.95 71.85 71.75 71.65 ... 30.25 30.15 30.05
  * level      (level) float32 0.0
  * time       (time) timedelta64[ns] 0 days 00:00:00 ... 3 days 18:00:00
Data variables:
    dust       (time, level, latitude, longitude) float32 ...
Attributes:
    title:        Dust Air Pollutant FORECAST at the Surface
    institution:  Data produced by Meteo France
    source:       Data from ENSEMBLE model
    history:      Model ENSEMBLE FORECAST
    FORECAST:     Europe, 20210204+[0H_90H]
    summary:      ENSEMBLE model hourly FORECAST of Dust concentration at the...
    project:      MACC-RAQ (http://macc-raq.gmes-atmosphere.eu)

The data above has four dimensions (latitude, longitude, level and time) and one data variable:

  • dust: Dust Air Pollutant

Let us inspect the coordinates of the file more in detail.

Below, you see that the data set consists of 31 time steps, starting on 04 February 2021 00 UTC and ranging up to 6 days ahead. However, the format of the time coordinates is in nanoseconds.

Let us convert the time information into a human-readable time format.

file.time
<xarray.DataArray 'time' (time: 31)>
array([              0,  10800000000000,  21600000000000,  32400000000000,
        43200000000000,  54000000000000,  64800000000000,  75600000000000,
        86400000000000,  97200000000000, 108000000000000, 118800000000000,
       129600000000000, 140400000000000, 151200000000000, 162000000000000,
       172800000000000, 183600000000000, 194400000000000, 205200000000000,
       216000000000000, 226800000000000, 237600000000000, 248400000000000,
       259200000000000, 270000000000000, 280800000000000, 291600000000000,
       302400000000000, 313200000000000, 324000000000000],
      dtype='timedelta64[ns]')
Coordinates:
  * time     (time) timedelta64[ns] 0 days 00:00:00 ... 3 days 18:00:00
Attributes:
    long_name:  FORECAST time from 20210204

First, from the long_name information of the time dimension, we can retrieve the initial timestamp. With the function strptime() from Python’s datetime library, we can convert it into a datetime.datetime object.

timestamp = file.time.long_name[19:27]

timestamp_init=datetime.strptime(timestamp,'%Y%m%d' )
timestamp_init
datetime.datetime(2021, 2, 4, 0, 0)

In a next step, we then build a DateTimeIndex object with the help of Panda’s date_range() function, making use of the length of the time dimension.

The result is a DateTimeIndex object, which can be used to newly assign the time coordinate information.

time_coords = pd.date_range(timestamp_init, periods=len(file.time), freq='3h').strftime("%Y-%m-%d %H:%M:%S").astype('datetime64[ns]')
time_coords
DatetimeIndex(['2021-02-04 00:00:00', '2021-02-04 03:00:00',
               '2021-02-04 06:00:00', '2021-02-04 09:00:00',
               '2021-02-04 12:00:00', '2021-02-04 15:00:00',
               '2021-02-04 18:00:00', '2021-02-04 21:00:00',
               '2021-02-05 00:00:00', '2021-02-05 03:00:00',
               '2021-02-05 06:00:00', '2021-02-05 09:00:00',
               '2021-02-05 12:00:00', '2021-02-05 15:00:00',
               '2021-02-05 18:00:00', '2021-02-05 21:00:00',
               '2021-02-06 00:00:00', '2021-02-06 03:00:00',
               '2021-02-06 06:00:00', '2021-02-06 09:00:00',
               '2021-02-06 12:00:00', '2021-02-06 15:00:00',
               '2021-02-06 18:00:00', '2021-02-06 21:00:00',
               '2021-02-07 00:00:00', '2021-02-07 03:00:00',
               '2021-02-07 06:00:00', '2021-02-07 09:00:00',
               '2021-02-07 12:00:00', '2021-02-07 15:00:00',
               '2021-02-07 18:00:00'],
              dtype='datetime64[ns]', freq=None)

And the last step is to assign the converted time information to the DataArray dust, with the function assign_coords().

dust = file.assign_coords(time=time_coords)
dust
<xarray.Dataset>
Dimensions:    (longitude: 700, latitude: 420, level: 1, time: 31)
Coordinates:
  * longitude  (longitude) float32 335.0 335.1 335.2 335.4 ... 44.75 44.85 44.95
  * latitude   (latitude) float32 71.95 71.85 71.75 71.65 ... 30.25 30.15 30.05
  * level      (level) float32 0.0
  * time       (time) datetime64[ns] 2021-02-04 ... 2021-02-07T18:00:00
Data variables:
    dust       (time, level, latitude, longitude) float32 ...
Attributes:
    title:        Dust Air Pollutant FORECAST at the Surface
    institution:  Data produced by Meteo France
    source:       Data from ENSEMBLE model
    history:      Model ENSEMBLE FORECAST
    FORECAST:     Europe, 20210204+[0H_90H]
    summary:      ENSEMBLE model hourly FORECAST of Dust concentration at the...
    project:      MACC-RAQ (http://macc-raq.gmes-atmosphere.eu)

The latitude values have a 0.1 degrees resolution and have a global N-S coverage.

file.latitude
<xarray.DataArray 'latitude' (latitude: 420)>
array([71.95, 71.85, 71.75, ..., 30.25, 30.15, 30.05], dtype=float32)
Coordinates:
  * latitude  (latitude) float32 71.95 71.85 71.75 71.65 ... 30.25 30.15 30.05
Attributes:
    long_name:  latitude
    units:      degrees_north

The longitude values have a 0.1 degrees resolution as well, but are on a [0,360] grid instead of a [-180,180] grid.

file.longitude
<xarray.DataArray 'longitude' (longitude: 700)>
array([335.05, 335.15, 335.25, ...,  44.75,  44.85,  44.95], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 335.0 335.1 335.2 335.4 ... 44.75 44.85 44.95
Attributes:
    long_name:  longitude
    units:      degrees_east

You can assign new values to coordinates in an xarray.Dataset. You can do this with the assign_coords() function, which you can apply onto a xarray.Dataset. With the code below, you shift your longitude grid from [0,360] to [-180,180]. At the end, you sort the longitude values in an ascending order.

file_assigned = dust.assign_coords(longitude=(((dust.longitude + 180) % 360) - 180)).sortby('longitude')
file_assigned
<xarray.Dataset>
Dimensions:    (longitude: 700, latitude: 420, level: 1, time: 31)
Coordinates:
  * longitude  (longitude) float32 -24.95 -24.85 -24.75 ... 44.75 44.85 44.95
  * latitude   (latitude) float32 71.95 71.85 71.75 71.65 ... 30.25 30.15 30.05
  * level      (level) float32 0.0
  * time       (time) datetime64[ns] 2021-02-04 ... 2021-02-07T18:00:00
Data variables:
    dust       (time, level, latitude, longitude) float32 ...
Attributes:
    title:        Dust Air Pollutant FORECAST at the Surface
    institution:  Data produced by Meteo France
    source:       Data from ENSEMBLE model
    history:      Model ENSEMBLE FORECAST
    FORECAST:     Europe, 20210204+[0H_90H]
    summary:      ENSEMBLE model hourly FORECAST of Dust concentration at the...
    project:      MACC-RAQ (http://macc-raq.gmes-atmosphere.eu)

A quick check of the longitude coordinates of the new xarray.Dataset shows you that the longitude values range now between [-180, 180].

file_assigned.longitude
<xarray.DataArray 'longitude' (longitude: 700)>
array([-24.950012, -24.849976, -24.75    , ...,  44.75    ,  44.850006,
        44.949997], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 -24.95 -24.85 -24.75 ... 44.75 44.85 44.95

Retrieve the data variable dust as data array#

Let us store the data variable dust concentration as xarray.DataArray with the name dust.

dust = file_assigned.dust
dust
<xarray.DataArray 'dust' (time: 31, level: 1, latitude: 420, longitude: 700)>
[9114000 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 -24.95 -24.85 -24.75 ... 44.75 44.85 44.95
  * latitude   (latitude) float32 71.95 71.85 71.75 71.65 ... 30.25 30.15 30.05
  * level      (level) float32 0.0
  * time       (time) datetime64[ns] 2021-02-04 ... 2021-02-07T18:00:00
Attributes:
    species:        Dust
    units:          µg/m3
    value:          hourly values
    standard_name:  mass_concentration_of_dust_in_air

Above, you see that the variable dust has four attributes, species, units, value and standard_name. Let us define variables for unit and standard_name attributes. The variables can be used for visualizing the data.

long_name = dust.standard_name
units = dust.units

Let us do the same for the coordinates longitude and latitude.

latitude = dust.latitude
longitude = dust.longitude

Visualize the dust concentration in Europe#

The next step is to visualize the dataset. You can use the function visualize_pcolormesh, which makes use of matploblib’s function pcolormesh and the Cartopy library.

With ?visualize_pcolormesh you can open the function’s docstring to see what keyword arguments are needed to prepare your plot.

?visualize_pcolormesh

You can make use of the variables we have defined above:

  • units

  • long_name

  • latitude

  • longitude

Additionally, you can specify the color scale and minimum and maxium data values.

time_index = 21
visualize_pcolormesh(data_array=dust[time_index,0,:,:],
                     longitude=longitude,
                     latitude=latitude,
                     projection=ccrs.PlateCarree(),
                     color_scale='hot_r',
                     unit=units,
                     long_name=long_name + ' ' + str(dust[time_index,0,:,:].time.data),
                     vmin=0, 
                     vmax=300,
                     lonmin=longitude.min().data,
                     lonmax=longitude.max().data,
                     latmin=latitude.min().data,
                     latmax=latitude.max().data,
                     set_global=False)
(<Figure size 1440x720 with 2 Axes>,
 <GeoAxesSubplot:title={'center':'mass_concentration_of_dust_in_air 2021-02-06T15:00:00.000000000'}>)
../_images/2aca0f38fc4ba6f8bf7ca382fe76b0c7aa2130e7ccb79bd658c2a75a1034c662.png

Animate dust concentration forecasts over Europe#

In the last step, you can animate the dust concentration in order to see how the trace gas develops over a period of 4 days, from 4th to 7th February 2021.

You can do animations with matplotlib’s function animation. Jupyter’s function HTML can then be used to display HTML and video content.

The animation function consists of 4 parts:

  • Setting the initial state:
    Here, you define the general plot your animation shall use to initialise the animation. You can also define the number of frames (time steps) your animation shall have.

  • Functions to animate:
    An animation consists of three functions: draw(), init() and animate(). draw() is the function where individual frames are passed on and the figure is returned as image. In this example, the function redraws the plot for each time step. init() returns the figure you defined for the initial state. animate() returns the draw() function and animates the function over the given number of frames (time steps).

  • Create a animate.FuncAnimation object:
    The functions defined before are now combined to build an animate.FuncAnimation object.

  • Play the animation as video:
    As a final step, you can integrate the animation into the notebook with the HTML class. You take the generate animation object and convert it to a HTML5 video with the to_html5_video function

# Setting the initial state:
# 1. Define figure for initial plot
fig, ax = visualize_pcolormesh(data_array=dust[0,0,:,:],
                               longitude=dust.longitude, 
                               latitude=dust.latitude,
                               projection=ccrs.PlateCarree(), 
                               color_scale='hot_r', 
                               unit=units,
                               long_name=long_name + ' '+ str(dust.time[0].data),
                               vmin=0,
                               vmax=250, 
                               lonmin=longitude.min().data, 
                               lonmax=longitude.max().data, 
                               latmin=latitude.min().data, 
                               latmax=latitude.max().data,
                               set_global=False)

frames = 24

def draw(i):
    img = plt.pcolormesh(dust.longitude, 
                         dust.latitude, 
                         dust[i,0,:,:], 
                         cmap='hot_r', 
                         transform=ccrs.PlateCarree(),
                         vmin=0,
                         vmax=250,
                         shading='auto')
    
    ax.set_title(long_name + ' '+ str(dust.time[i].data), fontsize=20, pad=20.0)
    return img

def init():
    return fig

def animate(i):
    return draw(i)

ani = animation.FuncAnimation(fig, animate, frames, interval=800, blit=False,
                              init_func=init, repeat=True)

HTML(ani.to_html5_video())
plt.close(fig)

Play the animation video as HTML5 video

HTML(ani.to_html5_video())