Cross-domain vulnerability detection using graph embedding and domain adaptation

2023 
Vulnerability detection is an effective means to maintain cyberspace security. Machine learning methods have risen much attention in software security due to their advantage of accuracy and automation. However, current researches mainly focus on in-domain vulnerability detection where the training data and test data belong to the same domain. Due to application scenarios, coding habits, and other factors, vulnerabilities in different software projects may obey different probability distributions. This discrepancy compromises the performance of machine learning methods when they are applied to a brand-new project. To address this cold start problem, we propose a cross-domain vulnerability detection framework using graph embedding and deep domain adaption (VulGDA). It works in a variety of cross-domain fashions, including the Zero-Shot fashion that no labeled data in the target domain is available for training. VulGDA is decomposed to graph embedding and domain adaptation. At the graph embedding stage, we transform the samples in source code into graph representations where elements are directly concatenated according to their syntactic and semantic relationships. Then, we aggregate information from neighbors and edges defined in the graph into real-valued vectors. By graph embedding, VulGDA extracts comprehensive vulnerability features and compromises the challenge of long-term dependency. Aiming at the discrepancy between training data and test data, domain adaption is used to train a feature generator. This feature generator maps the graph embedding to a “deep” feature that is discriminative for vulnerability detection, and invariant to the shift between domains. We perform a systematic experiment to validate the effectiveness of VulGDA. The results show that combining graph embedding and deep domain adaptation promotes VulGDA's performance in cross-domain vulnerability detection. Compared with the state-of-the-art methods, our method has better performance under the cold start condition.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []