Finding Substitutable Binary Code for Reverse Engineering by Synthesizing Adapters

2018 
Independently developed codebases typically contain many segments of code that perform the same or closely related operations. Being able to detect these related segments is helpful to applications such as reverse engineering. In this paper, we tackle the problem of determining whether two segments of binary code perform the same operation by asking whether one segment can be substituted by the other. A key insight behind our approach is that because these segments often have different interfaces, some glue code (an adapter) will be needed to perform the substitution. Here we present an algorithm that searches for substitutable code segments by attempting to synthesize an adapter between them. We implement our technique using concrete adapter enumeration and binary symbolic execution to explore the relation between size of adapter search space and total search time. Then, using more than 61,000 fragments of binary code extracted from a ARM image built for the iPod Nano 2g device and functions from the VLC media player, we evaluate our adapter synthesis implementation on more than one million synthesis tasks. Our tool finds dozens of instances of VLC functions in the firmware image. These results confirm that instances of adaptably substitutable binary functions exist in real-world code, and suggest that adapter synthesis has promising reverse engineering applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    6
    Citations
    NaN
    KQI
    []