Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing

Anastasia Shimorina,Elena Khasanova,Claire Gardent

Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing

2019

Anastasia Shimorina
Elena Khasanova
Claire Gardent

In this paper, we propose an approach for semi-automatically creating a data-to-text (D2T) corpus for Russian that can be used to learn a D2T natural language generation model. An error analysis of the output of an English-to-Russian neural machine translation system shows that 80% of the automatically translated sentences contain an error and that 53% of all translation errors bear on named entities (NE). We therefore focus on named entities and introduce two post-editing techniques for correcting wrongly translated NEs.

Keywords:

Machine translation
Natural language processing
Artificial intelligence
Computer science
error analysis
Natural language generation
machine translation system
text generation

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations