Median-Pooling Grad-CAM: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification.

2021 
Gradient-based visual explanation techniques, such as Grad-CAM and Grad-CAM++ have been used to interpret how convolutional neural networks make decisions. But not all techniques can work properly in the task of remote sensing (RS) image classification. In this paper, after analyzing why Grad-CAM performs worse than Grad-CAM++ for RS images classification from the perspective of weight matrix of gradients, we propose an efficient visual explanation approach dubbed median-pooling Grad-CAM. It uses median pooling to capture the main trend of gradients and approximates the contributions of feature maps with respect to a specific class. We further propose a new evaluation index, confidence drop %, to express the degree of drop of classification accuracy when occluding the important regions that are captured by the visual saliency. Experiments on two RS image datasets and for two CNN models of VGG and ResNet, show our proposed method offers a good tradeoff between interpretability and efficiency of visual explanation for CNN-based models in RS image classification. The low time-complexity median-pooling Grad-CAM could provide a good complement to the gradient-based visual explanation techniques in practice.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []