BEAN: A BEhavior ANalysis Approach of URL Spam Filtering in Twitter

2015 
Social websites, like Twitter and Facebook, strive to detect and remove URL spam in order to keep their users happy and coming back. Although researchers have already proposed many filtering approaches such as SpamRank and TrustRank, most of which detect URL spam using content analysis on the Web pages behind or link analysis on Web graph, it is challenging to automatically detect URL spam in social media as spammers keep evolving and advancing their techniques, such as cloaking based on the IP addresses, using multiple user accounts and redirectors. In this paper, we introduce BEAN, a behavior analysis technique, which detects URL spam by capturing the anomalous message sending behaviors of spammers. Twitter is an ideal place for our analysis due to its popularity and real-time properties. We collect over 2.4 million tweets from around a million users based on Twitter trending topics for 4 months. We apply our behavior analysis approach derived from a Markov Chain model to the Twitter dataset, and achieve a precision of 0.91 and recall of 0.88. In doing so we detected a lot of URL spam that cannot be filtered out by conventional approaches such as SVM and TrustRank, indicating that our approach is a good complement to existing URL spam detection techniques. Also, we further investigate anomalous behavior patterns of spammers in spreading URL spam to confirm our assumption.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    6
    Citations
    NaN
    KQI
    []