A machine learning approach to estimate median income levels of sub-districts in Thailand using satellite and geospatial data

2019 
Collecting economic data like household income through traditional survey methods is expensive and time consuming making it scarce, especially for developing countries. Satellite nighttime light data and geospatial factors like the distance from a major metropolitan area have been found to correlate with a myriad of economic indicators. However, no studies have incorporated such variables to estimate the median household income for administrative units of a country. We initially performed regression analysis by taking sub-district median household incomes of Thailand as the dependent variable. The independent variables chosen were nighttime light statistics, Euclidean distances from two major metropolitan provinces, population density estimates, and vehicle road density, all calculated from geospatial data. The regression model yielded a R2 score of 0.57. This result showed that the independent variables can explain a good portion of the variability of median income. Building on this result, we used K-Means clustering to discretize the median income to 3 ordinal levels to form a classification problem. Using these levels as the target, we propose a machine learning approach that incorporates the aforementioned independent variables to estimate median income levels of sub-districts of Thailand. Our classifier achieved a F1 score of 0.82. Our study shows the robustness of satellite and geospatial data in classifying low and high income regions at a granularity useful for policy makers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    2
    Citations
    NaN
    KQI
    []