ROFF - A Romanian Twitter Dataset for Offensive Language

Mihai Manolescu,Çağrı Çöltekin

ROFF - A Romanian Twitter Dataset for Offensive Language

2021

Mihai Manolescu
Çağrı Çöltekin

This paper describes the annotation process of an offensive language data set for Romanian on social media. To facilitate comparable multi-lingual research on offensive language, the annotation guidelines follow some of the recent annotation efforts for other languages. The final corpus contains 5000 micro-blogging posts annotated by a large number of volunteer annotators. The inter-annotator agreement and the initial automatic discrimination results we present are in line with earlier annotation efforts.

Keywords:

Natural language processing
Artificial intelligence
Offensive
Computer science
Linguistics
Romanian

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations