Speech recognition based on concatenated acoustic feature and lightGBM model

2021 
In this paper, we focus on the application of the LightGBM model for audio sound classification. Though convolutional neural networks (CNN) generally have superior performance, LightGBM model possess certain notable advantages, such as low computational costs, feasibility of parallel implementations, and comparable accuracies over many datasets. In order to improve the generalization ability of the model, data augmentation operations are performed on the audio clips including pitch shifting, time stretching, compressing the dynamic range and adding white noise. The accuracy of speech recognition heavily depends on the reliability of the representative features extracted from the audio signal. The audio signal is originally a one-dimensional time series signal, which is difficult to visualize the frequency change. Hence it is necessary to extract the discernible components in the audio signal. To improve the representative capacity of our proposed model, we use the Mel spectrum and MFCC (Mel-Frequency Cepstral Coefficients) to select features as twodimensional input to accurately characterize the internal information of the signal. The techniques mentioned in this paper are mainly trained on Google Speech Commands dataset. The experimental results show that the method, which is an optimized LightGBM model based on the Mel spectrum, can achieve high word classification accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []