Language Identification Research Based on Dual Attention Mechanism

Mijit Ablimit,Mao Xueli,Askar Hamdulla

Language Identification Research Based on Dual Attention Mechanism

2021

Language identification(LID) is an important branch of speech technology. A key problem of language identification is how to extract effective speech segment representation from a given speech and improve the model performance. In recent years, deep learning has made significant progress in the application of language identification. Neural networks can be used to extract relevant features and effectively improve system performance. In order to solve the problem of poor feature extraction ability and low recognition rate, this paper considers both features and models, through the comparison of features such as MFCC, Fbank to determine spectrogram as the best input feature, and proposes a language identification method based on dual attention mechanism. This method first takes the spectrogram of the speech spectrogram, and converts it into a gray-scale spectrogram as input, uses a multi-level convolutional neural network to capture local features, extracts dual attention in channel and spatial dimension of the feature map through the CBAM module, catches temporal characteristics with bidirectional gated recurrent units, then transfers the local characteristics and timing characteristics jointly to a fully connected layer, and uses the fully connected layer to output language classes. This paper conducts experiments on the Common voice dataset and AP17-OLR dataset, it demonstrates that dual attention mechanism’s language identification method can achieve good results, increase the feature extraction ability and improve the performance of language identification.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations