A New Dataset and Efficient Baselines for Document-level Text Simplification in German

Annette Rios,Nicolas Spring,Tannon Kew,Marek Kostrzewa,Andreas Säuberli,Mathias Müller,Sarah Ebling

A New Dataset and Efficient Baselines for Document-level Text Simplification in German

2021

Annette Rios
Nicolas Spring
Tannon Kew
Marek Kostrzewa
Andreas Säuberli
Mathias Müller
Sarah Ebling

The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity. We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (‘20 Minutes’) that consists of full articles paired with simplified summaries. Furthermore, we present experiments on automatic text simplification with the pretrained multilingual mBART and a modified version thereof that is more memory-friendly, using both our new data set and existing simplification corpora. Our modifications of mBART let us train at a lower memory cost without much loss in performance, in fact, the smaller mBART even improves over the standard model in a setting with multiple simplification levels.

Keywords:

Data set
task
Standard model (cryptography)
Artificial intelligence
Natural language processing
German
Text simplification
Computer science
document level
Automatic summarization

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations