Learning Multi-Resolution Representations for Acoustic Scene Classification via Neural Networks

Zijiang Yang,Kun Qian,Zhao Ren,Alice Baird,Zixing Zhang,Björn Schuller

Learning Multi-Resolution Representations for Acoustic Scene Classification via Neural Networks

2020

This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test \(p<\)0.01 and \(p<\)0.05 respectively).

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations