Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context

Mladen Karan,Jan Šnajder

Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context

2019

Mladen Karan
Jan Šnajder

We address the task of automatically detecting toxic content in user generated texts. We fo cus on exploring the potential for preemptive moderation, i.e., predicting whether a particular conversation thread will, in the future, incite a toxic comment. Moreover, we perform preliminary investigation of whether a model that jointly considers all comments in a conversation thread outperforms a model that considers only individual comments. Using an existing dataset of conversations among Wikipedia contributors as a starting point, we compile a new large-scale dataset for this task consisting of labeled comments and comments from their conversation threads.

Keywords:

Thread (computing)
Natural language processing
Artificial intelligence
Language identification
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations