Bolstering Adversarial Robustness with Latent Disparity Regularization

2021 
Recent research has revealed that neural networks and other machine learning models are vulnerable to adversarial attacks that aim to subvert their predictions' integrity or privacy by adding a small calculated perturbation to inputs. Further, the adversary can significantly degrade the performance of the model. The number and severity of attacks continues to grow. However, a dearth of techniques robustly defends machine learning models in a computationally inexpensive way. Against this background, we propose an adversarially robust training procedure and objective function for arbitrary neural network architectures. Robustness of neural networks against adversarial attacks on integrity is achieved by augmentation of a novel regularization term. This regularizer penalizes the discrepancy between the representations induced in hidden layers by benign and adversarial data. We benchmark our regularization approach on the Fashion-Mnist and Cifar-10 datasets. Our model is benchmarked against three state-of-the-art defense methods, namely: (i) regularization to the largest eigenvalue in the Fisher information matrix of the activity of the terminal layer, (ii) a higher-level representation guided denoising autoencoder (trained with adversarial examples), and (iii) training an otherwise undefended model on data distorted by additive white Gaussian noise. Our experiments show that the proposed regularizer provides significant improvements in adversarial robustness over both an undefended baseline model as well as the same model defended with other techniques. This result is observed over several adversarial budgets with only a small (but seemingly unavoidable) decline in benign test accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []