On the Strength of Character Language Models for Multilingual Named Entity Recognition

Xiaodong Yu,Stephen Mayhew,Mark Sammons,Dan Roth

On the Strength of Character Language Models for Multilingual Named Entity Recognition

2018

Xiaodong Yu
Stephen Mayhew
Mark Sammons
Dan Roth

Character-level patterns have been widely used as features in English Named Entity Recognition (NER) systems. However, to date there has been no direct investigation of the inherent differences between name and non-name tokens in text, nor whether this property holds across multiple languages. This paper analyzes the capabilities of corpus-agnostic Character-level Language Models (CLMs) in the binary task of distinguishing name tokens from non-name tokens. We demonstrate that CLMs provide a simple and powerful model for capturing these differences, identifying named entity tokens in a diverse set of languages at close to the performance of full NER systems. Moreover, by adding very simple CLM-based features we can significantly improve the performance of an off-the-shelf NER system for multiple languages.

Keywords:

Named-entity recognition
Named entity
Natural language processing
Artificial intelligence
Computer science
Language model

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations