Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods

Ben King,Steven P. Abney

Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods

2013

Ben King
Steven P. Abney

In this paper we consider the problem of labeling the languages of words in mixed-language documents. This problem is approached in a weakly supervised fashion, as a sequence labeling problem with monolingual text samples for training data. Among the approaches evaluated, a conditional random field model trained with generalized expectation criteria was the most accurate and performed consistently as the amount of training data was varied.

Keywords:

Machine learning
Natural language processing
Training set
Conditional random field
Mixed language
Sequence labeling
Artificial intelligence
Pattern recognition
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

135

Citations