LTRWES: A New Framework for Security Bug Report Detection

2020 
Abstract Context: Security bug reports (SBRs) usually contain security-related vulnerabilities in software products, which could be exploited by malicious attackers. Hence, it is important to identify SBRs quickly and accurately among bug reports (BRs) that have been disclosed in bug tracking systems. Although a few methods have been already proposed for the detection of SBRs, challenging issues still remain due to noisy samples, class imbalance and data scarcity. Object: This motivates us to reveal the potential challenges faced by the state-of-the-art SBRs prediction methods from the viewpoint of data filtering and representation. Furthermore, the purpose of this paper is also to provide a general framework and new solutions to solve these problems. Method: In this study, we propose a novel approach LTRWES that incorporates learning to rank and word embedding into the identification of SBRs. Unlike previous keyword-based approaches, LTRWES is a content-based data filtering and representation framework that has several desirable properties not shared in other methods. Firstly, it exploits ranking model to efficiently filter non-security bug reports (NSBRs) that have higher content similarity with respect to SBRs. Secondly, it applies word embedding technology to transform the rest of NSBRs, together with SBRs, into low-dimensional real-value vectors. Result: Experiment results on benchmark and large real-world datasets show that our proposed method outperforms the state-of-the-art method. Conclusion: Overall, the LTRWES is valid with high performance. It will help security engineers to identify SBRs from thousands of NSBRs more accurately than existing algorithms. Therefore, this will positively encourage the research and development of the content-based methods for security bug report detection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    6
    Citations
    NaN
    KQI
    []