An Experimental Study of Quantitative Evaluations on Saliency Methods

It has been long debated that eXplainable AI (XAI) is an important technology for model and data exploration, validation, and debugging. To deploy XAI into actual systems, an executable and comprehensive evaluation of the quality of generated explanation is highly in demand. In this paper, we briefly summarize the status quo of the quantitative metrics of different properties of XAI including evaluation on faithfulness, localization, sensitivity check, and stability. With an exhaustive experimental study based on them, we conclude that among all the typical methods we compare, no single explanation method dominates others in all metrics. Nonetheless, Gradient-weighted Class Activation Mapping (Grad-CAM) and Randomly Input Sampling for Explanation (RISE) perform fairly well in most of the metrics. We further present a novel utilization of the evaluation results to diagnose the classification bases for models. Hopefully, this valuable work could serve as a guide for future research.
    • Correction
    • Source
    • Cite
    • Save