Full Program »
Probabilistic Naming of Functions in Stripped Binaries
Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) in software deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code using high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. Punstrip provides a way of comparing semantically similar names for functions to suggest meaningful function names in stripped binaries called Symbol2Vec. We show that our approach is able to recognize functions compiled across a spectrum of different compilers and optimization levels and then demonstrate that our tool can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian operating system and compare against the state-of-the-art. To foster further research, we release our code and datasets as open source.