Clustering geospatial data for machine learning modeling of ambient soundscapes

Outdoor ambient acoustical environments may be predicted through supervised machine learning using geospatial features as inputs. Previous work used K-Means clustering applied to the geospatial features to identify geographic regions that are geospatially distinct. The clustering results help provide physical insights regarding which features are likely to play the largest roles in the supervised learning model and which locations are impacted by different acoustic training data. However, these results may be sensitive to details of the geospatial data, such as how the data are scaled or the presence of similar redundant features. This work builds on previous results by constructing a reduced feature set by removing redundant geospatial features and by using a physically motivated scaling scheme. Clustering analysis applied to this new dataset indicates that the contiguous United States can be naturally clustered into eight human-interpretable geographic regions. Hierarchical clustering is used to further subdivide these eight clusters into more fine-grained regions. One finding of interest is that no geospatial layer in the present soundscape model uniquely identifies rivers. These results will guide further geospatial layer development and acoustical data collection for more accurate soundscape models. [Work supported by a U.S. Army SBIR.]
    • Correction
    • Source
    • Cite
    • Save