A Unified Game-Theoretic Interpretation of Adversarial Robustness

2021 
This paper provides a unified view to explain different adversarial attacks and defense methods, \emph{i.e.} the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    97
    References
    4
    Citations
    NaN
    KQI
    []