Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System

Changhao Shan,Chao Weng,Guangsen Wang,Dan Su,Min Luo,Dong Yu,Lei Xie

Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System

2019

Recently, attention-based end-to-end automatic speech recognition system (ASR) has shown promising results. One of the limitations of an attention-based ASR system is that its language model (LM) component has to be implicitly learned from transcribed speech data which prevents one from uti-lizing plenty of text corpora to improve language modeling. In this work, the Component Fusion method is proposed to incorporate externally trained neural network (NN) LM into an attention-based ASR system. During training stage we equip the attention-based system with an additional LM component which is replaced by an externally trained NN LM at decoding stage. Experimental results show that the proposed Component Fusion outperforms two prior LM fusion approaches, i.e., Shallow Fusion and Cold Fusion, in both out-of-domain and in-domain scenarios. Further improvements can be achieved when combining Component and Shallow Fusion.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations