Towards certifying ℓ∞ robustness using Neural networks with ℓ∞-dist Neurons

2021 
It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small l∞ perturbations. Many attempts have been tried to learn a network that can resist such adversarial attacks. However, most previous works either can only provide empirical verification of the defense to a particular attack method or can only develop a theoretical guarantee of the model robustness in limited scenarios. In this paper, we develop a theoretically principled neural network that inherently resists l∞ perturbations. In particular, we design a novel neuron that uses l∞ distance as its basic operation, which we call l∞-dist neuron. We show that the l∞-dist neuron is naturally a 1-Lipschitz function with respect to the l∞ norm, and the neural networks constructed with l∞-dist neuron (l∞-dist Nets) enjoy the same property. This directly provides a theoretical guarantee of the certified robustness based on the margin of the prediction outputs. We further prove that the l∞-dist Nets have enough expressiveness power to approximate any 1-Lipschitz function, and can generalize well as the robust test error can be upper-bounded by the performance of a large margin classifier on the training data. Preliminary experiments show that even without the help of adversarial training, the learned networks with high classification accuracy are already provably robust.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []