A satellite-based spatio-temporal machine learning model to reconstruct daily PM2.5 concentrations across Great Britain

2020 
Epidemiological studies on health effects of air pollution usually rely on measurements from monitors, which provide limited spatio-temporal coverage. Data from satellites, reanalysis and chemical transport models offer additional information used to reconstruct pollution concentrations at high spatio-temporal resolution. The aim of this study is to develop a multi-stage satellite-based machine learning model to estimate daily fine particulate matter (PM2.5) levels across Great Britain during 2003-2018. This high-resolution model consists of random forest (RF) algorithms applied in four stages. Stage-1 augmented monitor-PM2.5 series using co-located PM10 measures. Stage-2 imputed missing satellite aerosol optical depth observations using atmospheric reanalysis models. Stage-3 integrates the output from previous stages with spatial and spatio-temporal variables to build a prediction model for PM2.5. Stage-4 applied Stage-3 models to estimate daily PM2.5 concentrations over a 1-km grid. The RF architecture performed well in all stages, with results from Stage-3 showing an average cross-validated R2 of 0.788 and minimal bias. Spatial and temporal scale also performed well with R2 of 0.822 and 0.779, respectively. The high spatio-temporal resolution and relatively high precision allows this dataset (1.37 billion points) to be used in epidemiological analyses to assess health risks associated with both short- and long-term exposures to PM2.5.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    1
    Citations
    NaN
    KQI
    []