An End-to-End Speech Accent Recognition Method Based on Hybrid CTC/Attention Transformer ASR

2021 
This paper proposes a novel accent recognition system in the framework of a transformer-based end-to-end speech recognition system. To incorporate the pronunciation and linguistic knowledge into the network, we first pre-train an ASR model in a hybrid CTC/attention manner. Then, focusing on accent recognition, we extend the output token list by inserting accent labels to the transcripts and finetune the network parameters with an accented speech dataset. Our work is evaluated on the Interspeech 2020 Accented English Speech Recognition Challenge. Experiments show that our method achieves an accuracy of 72.39% on the test set and 80.98% on the development set, outperforming the baseline system by a very large margin. Our submitted system ranked second in the accent recognition task in the challenge.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []