Unsupervised Perturbation based Self-Supervised Adversarial Training

2021 
Deep neural networks (DNNs) are vulnerable to adversarial attacks. Existing adversarial defense approaches mostly use a large number of labels during the training step to improve the model's robustness. However, the labeling typically requires a lot of resources and is time-consuming, especially when the annotation is hard to generate (e.g., an emergency scene in autopilot). In this paper, we propose an instance-level unsupervised perturbation to replace the supervised class-level adversarial sample in the robust training. The unsupervised perturbation is generated on various transformed views of single input, which aims to make the model confuse the instance-level discrimination of this specific input. We further introduce the contrastive learning based adversarial learning(UPAT), which maximizes the agreement between the transformed instance with its corresponding unsupervised perturbed output, and encourages the model to suppress the vulnerability in the embedding space. We conduct comprehensive experiments on three image benchmarks, and the quantitative results demonstrate that our defense approach consistently outperforms prior state-of-the-art techniques, by improving the defense ability efficiently on various white-box attacks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []