A Comparative Study on Two Ground Truth Inference Algorithms based on Manually Labeled Social Media Data

2019 
In the booming information era, smart devices such as smart phones accompany peoples’ lives all the time. Social media platforms provide users with uninterrupted communication and information acquisition including posting users’ feelings and sharing ideas. This study focuses on short texts posted by users. Their true meaning is defined as ground truth. However, acquiring it from the users directly is extremely difficult and time-consuming. In other words, in many cases, short texts do not have their ground truth. Thus, we deal with a no ground truth problem. In this work, we ask for labelers to label short texts completely based on their own judgment of these texts. Two ground truth inference approaches, majority voting (MV) and positive label frequency threshold (PLAT), integrate the labels from different labelers and deduce the ground truth. We then analyze which one better suits for labeling unlabeled short texts. The work is of great significance in helping us obtain useful knowledge from massive social media data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []