Phrase-based data selection for language model adaptation in spoken language translation

Shixiang Lu,Wei Wei,Xiaoyin Fu,Lichun Fan,Bo Xu

Phrase-based data selection for language model adaptation in spoken language translation

2012

Shixiang Lu
Wei Wei
Xiaoyin Fu
Lichun Fan
Bo Xu

In this paper, we propose an unsupervised phrase-based data selection model, address the problem of selecting no-domain-specific language model (LM) training data to build adapted LM for use. In spoken language translation (SLT) system, we aim at finding the LM training sentences which are similar to the translation task. Compared with the traditional bag-of-words models, the phrase-based data selection model is more effective because it captures contextual information in modeling the selection of phrase as a whole, rather than selection of single words in isolation. Large-scale experimental results demonstrate that our approach significantly outperforms the state-of-the-art approaches on both LM perplexity and translation performance, respectively.

Keywords:

Machine translation
Speech processing
Cache language model
Speech recognition
Perplexity
Computer science
Spoken language
Universal Networking Language
Natural language processing
Phrase
Language model
Artificial intelligence
language translation

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations