Clustering analysis of inputs to a geospatial model of outdoor ambient sound

2018 
Outdoor ambient acoustical environments may be predicted through supervised machine learning using geospatial features as inputs. However, collecting sufficient training data is an expensive process, particularly when attempting to improve the accuracy of models based on supervised learning methods over large, geospatially diverse regions. Unsupervised machine learning methods, such as K-Means clustering analysis, enable a statistical comparison between the geospatial diversity represented in the current training dataset versus the predictor locations. In this case, the geospatial features that represent the regions of western North Carolina and Utah have been simultaneously clustered to examine the common clusters between the two locations. Initial results show that most geospatial clusters group themselves according to a relatively small number of prominent geospatial features, and that Utah requires appreciably more clusters to represent its geospace. Additionally, the training dataset has a relatively low geospatial diversity because most of the current training data sites reside in a small number of clusters. This analysis informs a choice of new site locations for data acquisition that maximize the statistical similarity of the training and input datasets. [Work funded by an Army SBIR.]Outdoor ambient acoustical environments may be predicted through supervised machine learning using geospatial features as inputs. However, collecting sufficient training data is an expensive process, particularly when attempting to improve the accuracy of models based on supervised learning methods over large, geospatially diverse regions. Unsupervised machine learning methods, such as K-Means clustering analysis, enable a statistical comparison between the geospatial diversity represented in the current training dataset versus the predictor locations. In this case, the geospatial features that represent the regions of western North Carolina and Utah have been simultaneously clustered to examine the common clusters between the two locations. Initial results show that most geospatial clusters group themselves according to a relatively small number of prominent geospatial features, and that Utah requires appreciably more clusters to represent its geospace. Additionally, the training dataset has a relatively...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []