Robust prediction of hourly PM2.5 from meteorological data using LightGBM

2021 
Retrieving historical fine particulate matter (PM2.5) data is key for evaluating the long-term impacts of PM2.5 on the environment, human health and climate change. Satellite-based aerosol optical depth has been used to estimate PM2.5, but estimations have largely been undermined by massive missing values, low sampling frequency and weak predictive capability. Here, using a novel feature engineering approach to incorporate spatial effects from meteorological data, we developed a robust LightGBM model that predicts PM2.5 at an unprecedented predictive capacity on hourly (R2 = 0.75), daily (R2 = 0.84), monthly (R2 = 0.88) and annual (R2 = 0.87) timescales. By taking advantage of spatial features, our model can also construct hourly gridded networks of PM2.5. This capability would be further enhanced if meteorological observations from regional stations were incorporated. Our results show that this model has great potential in reconstructing historical PM2.5 datasets and real-time gridded networks at high spatial-temporal resolutions. The resulting datasets can be assimilated into models to produce long-term re-analysis that incorporates interactions between aerosols and physical processes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    3
    Citations
    NaN
    KQI
    []