Marginal Attacks of Generating Adversarial Examples for Spam Filtering

2021 
Digit information has been used in many areas and has been widely spread in the Internet era because of its convenience. However, many ill-disposed attackers, such as spammers take advantage of such convenience to send unsolicited information, such as advertisements, frauds, and pornographic messages to mislead users and this might cause severe consequences. Although many spam filters have been proposed in detecting spams, they are vulnerable and could be misled by some carefully crafted adversarial examples. In this paper, we propose the marginal attack methods of generating such adversarial examples to fool a naive Bayesian spam filter. Specifically, we propose three methods to select sensitive words from a sentence and add them at the end of the sentence. Through extensive experiments, we show that the generated adversarial examples could largely reduce the filter’s detecting accuracy, e.g. by adding only one word, the accuracy could be reduced from 93.6% to 55.8%. Furthermore, we evaluate the transferability of the generated adversarial examples against other traditional filters such as logic regression, decision tree and linear support vector machine based filters. The evaluation results show that these filters’ accuracy is also reduced dramatically; especially, the decision tree based filter’s accuracy drops from 100% to 1.51% by inserting only one word.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []