Ground Control to Major Tom: the importance of field surveys in remotely sensed data analysis

2017 
Author(s): Bolliger, I; Carleton, T; Hsiang, S; Kadish, J; Proctor, J; Recht, B; Rolf, E; Shankar, V | Abstract: In this project, we build a modular, scalable system that can collect, store, and process millions of satellite images. We test the relative importance of both of the key limitations constraining the prevailing literature by applying this system to a data-rich environment. To overcome classic data availability concerns, and to quantify their implications in an economically meaningful context, we operate in a data rich environment and work with an outcome variable directly correlated with key indicators of socioeconomic well-being. We collect public records of sale prices of homes within the United States, and then gradually degrade our rich sample in a range of different ways which mimic the sampling strategies employed in actual survey-based datasets. Pairing each house with a corresponding set of satellite images, we use image-based features to predict housing prices within each of these degraded samples. To generalize beyond any given featurization methodology, our system contains an independent featurization module, which can be interchanged with any preferred image classification tool. Our initial findings demonstrate that while satellite imagery can be used to predict housing prices with considerable accuracy, the size and nature of the ground truth sample is a fundamental determinant of the usefulness of imagery for this category of socioeconomic prediction. We quantify the returns to improving the distribution and size of observed data, and show that the image classification method is a second-order concern. Our results provide clear guidance for the development of adaptive sampling strategies in data-sparse locations where satellite-based metrics may be integrated with standard survey data, while also suggesting that advances from image classification techniques for satellite imagery could be further augmented by more robust sampling strategies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    1
    Citations
    NaN
    KQI
    []