ROFF - A Romanian Twitter Dataset for Offensive Language
2021
This paper describes the annotation process of an offensive language data set for Romanian on social media. To facilitate comparable multi-lingual research on offensive language, the annotation guidelines follow some of the recent annotation efforts for other languages. The final corpus contains 5000 micro-blogging posts annotated by a large number of volunteer annotators. The inter-annotator agreement and the initial automatic discrimination results we present are in line with earlier annotation efforts.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
17
References
0
Citations
NaN
KQI