Deep Convolutional Neural Network Combined with Concatenated Spectrogram for Environmental Sound Classification

2019 
Environmental sound classification (ESC) is an important but challenging issue. In this paper, we propose a new deep convolutional neural network, which uses concatenated spectrogram as input features, for ESC task. This concatenated spectrogram feature we adopt can increase the richness of features compared with single spectrogram. It is generated by concatenating two regular spectrograms, the Log-Mel spectrogram and the Log-Gammatone spectrogram. The network we propose uses convolutional blocks to extract and derive high-level feature images from concatenated spectrogram, and each block is composed of three convolutional layers and a pooling layer. In order to keep depth of the network and reduce numbers of parameters, we use filter with a small receptive field in each convolutional layer. Besides, we use the average pooling to keep more information. Our method was tested on ESC-50 and UrbanSound8K and achieved classification accuracy of 83.8% and 80.3%, respectively. The experimental results show that the proposed method is suitable for ESC task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    4
    Citations
    NaN
    KQI
    []