Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance

2020 
Abstract The multi-fold growth of the social media user-base fuelled a substantial increase in the amount of hate speech posts on social media platforms. The enormous data volume makes it hard to capture such cases and either moderate or delete them. This paper presents an approach to detect and visualize online aggression, a special case of hate speech, over social media. Aggression is categorized into overtly aggressive (OAG), covertly aggressive (CAG), and non-aggressive labels (NAG). We have designed a user interface based on a web browser plugin over Facebook and Twitter to visualize the aggressive comments posted on the Social media user’s timelines. This plugin interface might help to the security agency to keep a tab on the social media stream. It also provides citizens with a tool that is typically only available for large enterprises. The availability of such a tool alleviates the technological imbalance between industry and citizens. Besides, the system might be helpful to the research community to create further tools and prepare weakly labeled training data in a few minutes using comments posted by users on celebrity’s Facebook, Twitter timeline. We have reported the results on a newly created dataset of user comments posted on Facebook and Twitter using our proposed plugins and the standard Trolling Aggression Cyberbullying 2018 (TRAC) dataset in English and code-mixed Hindi. Various classifiers like Support Vector Machine (SVM), Logistic regression, deep learning model based on Convolution Neural Network (CNN), Attention-based model, and the recently proposed BERT pre-trained language model by Google AI, have been used for aggression classification. The weighted F1-score of around 0.64 and 0.62 is achieved on TRAC Facebook English and Hindi datasets while on Twitter English and Hindi datasets, the weighted F1-score is 0.58 and 0.50, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    6
    Citations
    NaN
    KQI
    []