Estimating hourly surface PM2.5 concentrations across China from high-density meteorological observations by machine learning

2021 
Abstract The spatial-temporal variations of the ground-based and satellite-derived PM2.5 are crucial for studying air quality, human health and climate change. However, the existing ground-based PM2.5 monitoring network has sparsely-distributed sites and satellite cannot give 24-h PM2.5, which make it difficult to grasp the spatial and daily variation characteristics of PM2.5. This study aims to fill that gap by establishing a virtual network of hourly PM2.5 concentration using the LightGBM model, based on the high-density ground meteorological observations at ~2400 sites across China. The virtual network shows a desirable performance of hourly PM2.5 estimation across China, with R2 of 0.86, root-mean-square error values of 14.99 μg/m3, and mean absolute error of 9.48 μg/m3 (the results of Cross-Validation). It also exhibits high spatial-temporal consistencies with the observed PM2.5. Spatially, the heaviest PM2.5 pollution is mainly distributed in eastern China (especially the Beijing-Tianjin-Hebei, the Yangtze and Pearl river deltas, and the Sichuan-Chongqing areas). Temporarily, PM2.5 exhibits remarkable seasonal and diurnal changes characterized by higher concentration in winter and nighttime and lower in summer and daytime. Meanwhile, we found that visibility can be used as the primary predictor in the machine learning model to enhance the accuracy of estimated PM2.5. The established virtual hourly PM2.5 network (~2400 stations) provides a more intuitive and detailed PM2.5 data for us to understand the diurnal of PM2.5 and monitor inter-regional transport of haze over China. It thus is of benefit to the study of air pollution control and related diseases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    60
    References
    0
    Citations
    NaN
    KQI
    []