Full Program »
Keynote: Automated Data-driven Binary Analysis for Security
Wednesday, 7 December 2022
09:00 - 10:00
Amphitheater 204
Chair: Guofei Gu, Texas A&M University
Abstract:
Binary analysis is a fundamental building block for a broad spectrum of security tasks. Essentially, binary analysis encapsulates a diverse set of tasks that aim to understand and analyze the behaviors/semantics of binary programs. Existing approaches often tackle each analysis task independently and heavily employ ad-hoc task-specific brittle heuristics. While recent data-driven approaches have shown some early promise, they too tend to learn spurious features and overfit to specific tasks without understanding the underlying program semantics.
In this talk, I will describe some of our recent projects that use machine learning on both binary code and execution traces to learn program semantics and transfer the learned knowledge for different binary analysis tasks. Our key observation is that by designing pretraining tasks that can learn code semantics, we can drastically boost the performance of binary analysis tasks. Our pretraining tasks are fully self-supervised -- they do not need expensive labeling effort and therefore can easily generalize across different architectures, operating systems, compilers, optimizations, and obfuscations. Extensive experiments show that our approach drastically improves the performance of popular tasks like binary disassembly, matching semantically similar binary functions, and recovering types from binary.
About the Speaker:
Suman Jana is an associate professor in the department of computer science and the data science institute at Columbia University. His primary research interest is at the intersection of computer security and machine learning. His research has received six best paper awards, a CACM research highlight, a Google faculty fellowship, a JPMorgan Chase Faculty Research Award, an NSF CAREER award, and an ARO young investigator award.