Full Program »
ForeCast - Skimming off the Malware Cream
analysis, automated dynamic analysis is widely used for this purpose. Executing malicious software in a controlled environment while observing its behavior can provide rich information on a malware's capabilities. However, running each malware sample even for a few minutes is expensive.
For this reason, malware analysis efforts need to select a subset of samples for analysis. To date, this selection has been performed either randomly or using techniques focused on avoiding re-analysis of polymorphic malware.
In this paper, we present a novel approach to sample selection that attempts to maximize the total value of the information obtained from analysis, according to an application-dependent scoring function. To this end, we leverage previous work on behavioral malware clustering and introduce a machine-learning-based system that uses all statically-available information to predict into which
behavioral class a sample will fall, before the sample is actually executed.
We discuss scoring functions tailored at two practical applications of large-scale dynamic analysis: the compilation of network blacklists of command and control servers and the generation of remediation procedures
for malware infections. We implement these techniques in a tool called ForeCast. Large-scale evaluation on over
600000 malware samples shows that our prototype can increase
the amount of potential command and control servers detected by as much as 134\% over a random selection strategy and 54\% over a selection strategy based on sample diversity.
Author(s):
Matthias Neugschwandtner
Vienna University of Technology
Austria
Paolo Milani Comparetti
Vienna University of Technology
Austria
Gregoire Jacob
University of California, Santa Barbara
United States
Christopher Kruegel
University of California, Santa Barbara
Austria