Neural Representation Learning Based Binary Code Authorship Attribution

2021 
Authorship attribution on binary code is of great value in applications such as malware analysis, software forensics, and code theft detection. Inspired by the recent great successes of neural network and representation learning in various program analysis tasks, this study proposes NMPI to achieve fine-grained program authorship attribution by analyzing the binary codes of individual functions from the perspective of sequence and structural. To evaluate the NMPI, the study constructs a large dataset consisting of 268796 functions collected from Google CodeJam. The extensive experimental evaluation shows that NMPI can achieve 91% accuracy for the function-level binary code authorship attribution task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []