K-Means clustering of inputs to a geospatial model for optimizing acoustic data collection

Brooks A. Butler,Katrina Pedersen,Casie Maekawa,Kent L. Gee,Mark K. Transtrum,Michael M. James,Alexandria R. Salton

K-Means clustering of inputs to a geospatial model for optimizing acoustic data collection

2018

Outdoor ambient acoustical environments may be predicted through machine learning using geospatial features as inputs. However, collecting sufficient training data is an expensive process, particularly when attempting to improve the accuracy of models based on supervised learning methods over large, geospatially diverse regions. Unsupervised machine learning methods, such as K-Means clustering analysis, enable a statistical comparison between the geospatial diversity represented in the current training dataset versus the predictor locations. In this case, 117 geospatial features that represent the contiguous United States have been clustered using K-Means clustering. Results show that most geospatial clusters group themselves according to a relatively small number of prominent geospatial features. It is shown that the available acoustic training dataset has a relatively low geospatial diversity because most training data sites reside in a few clusters. This analysis informs the selection of new site locations for data collection that improve the statistical similarity of the training and input datasets.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations