A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification.

Sosuke Nishikawa,Ikuya Yamada,Yoshimasa Tsuruoka,Isao Echizen

A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification.

2021

Sosuke Nishikawa
Ikuya Yamada
Yoshimasa Tsuruoka
Isao Echizen

We present a multilingual bag-of-entities model that effectively boosts the performance of zero-shot cross-lingual text classification by extending a multilingual pre-trained language model (e.g., M-BERT). It leverages the multilingual nature of Wikidata: entities in multiple languages representing the same concept are defined with a unique identifier. This enables entities described in multiple languages to be represented using shared embeddings. A model trained on entity features in a resource-rich language can thus be directly applied to other languages. Our experimental results on cross-lingual topic classification (using the MLDoc and TED-CLDC datasets) and entity typing (using the SHINRA2020-ML dataset) show that the proposed model consistently outperforms state-of-the-art models.

Keywords:

Unique identifier
shot
Natural language processing
zero
cross lingual
Computer science
Language model
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations