MiBeX: Malware-Inserted Benign Datasets for Explainable Machine Learning

2021 
Deep learning has shown its capability for achieving extremely high accuracy for malware detection, but it suffers from an inherent lack of explainability. While methods for explaining these black-box algorithms are being extensively studied, explanations offered by algorithms, such as saliency mapping, are difficult to understand due to the lack of interpretability of many malware datasets. This chapter explores the role of information granularity in malware detection, as well as a scalable method to produce an intelligible malware dataset for machine learning classification. One of the resultant datasets is then used with a Malware as Image classifier to prove the method’s validity for use in training deep learning algorithms. The Malware as Image classifier achieves a training accuracy of 98.94% and a validation accuracy of 93.83%, showing that the method can produce valid datasets for use with machine learning. Gradient-based saliency mapping is then applied to the trained classifier to generate heat-map explanations of the network output.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []