Feature Distillation in Deep Attention Network Against Adversarial Examples.

2021 
Deep neural networks (DNNs) are easily fooled by adversarial examples. Most existing defense strategies defend against adversarial examples based on full information of whole images. In reality, one possible reason as to why humans are not sensitive to adversarial perturbations is that the human visual mechanism often concentrates on most important regions of images. A deep attention mechanism has been applied in many computer fields and has achieved great success. Attention modules are composed of an attention branch and a trunk branch. The encoder/decoder architecture in the attention branch has potential of compressing adversarial perturbations. In this article, we theoretically prove that attention modules can compress adversarial perturbations by destroying potential linear characteristics of DNNs. Considering the distribution characteristics of adversarial perturbations in different frequency bands, we design and compare three types of attention modules based on frequency decomposition and reorganization to defend against adversarial examples. Moreover, we find that our designed attention modules can obtain high classification accuracies on clean images by locating attention regions more accurately. Experimental results on the CIFAR and ImageNet dataset demonstrate that frequency reorganization in attention modules can not only achieve good robustness to adversarial perturbations, but also obtain comparable, even higher classification, accuracies on clean images. Moreover, our proposed attention modules can be integrated with existing defense strategies as components to further improve adversarial robustness.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []