Full Program »

DitDetector: Bimodal Learning based on Deceptive Image and Text for Macro Malware Detection

Macro malware has always been a severe threat to cyber security although the Microsoft Office suite applies the default macro-disabling policy. Among the defense solutions at different stages of the attack chain, document analysis is more targeted through detecting malicious documents with macro malware. It is effective, especially with machine learning methods, but still faces problems handling malware variants, supporting file formats, and attack countermeasures with advanced attack techniques (e.g., Excel 4.0 macro and remote template injection).

In this paper, we find it promising to detect deceptive information embedded in documents to trick users into enabling macros instead of based on file metadata or extracted macro codes. Thus, we propose a novel solution named DitDetector, which leverages bimodal learning based on deceptive images and text for macro malware detection. Specifically, we extract preview images of documents based on an image export SDK of Oracle and extract textual information from preview images based on an open-source OCR engine. And bimodal model of DitDetector contains a visual encoder, a textual encoder, and a forward neural network, which learns based on the joint representation of the two encoders' outputs. We evaluate DitDetector on three datasets, including an open-source malicious document dataset (i.e., MalDoc) and two collected real-world adversary datasets (i.e., a database of Excel macros and a database of remote template injection samples). Our experiments show that DitDetector outperforms four existing macro code-based machine learning methods and five reputable Anti-Virus engines. Especially in the real-world test of advanced macro malware, DitDetector gets the F1 score of 99.93\% which is at least 3.16\% higher than compared solutions.

Jia Yan
Institute of Software, Chinese Academy of Sciences / School of Computer Science and Technology, University of Chinese Academy of Sciences

Ming Wan
QIANXIN Group Inc.

Xiangkun Jia
Institute of Software, Chinese Academy of Sciences

Lingyun Ying
QIANXIN Group Inc.

Purui Su
Institute of Software, Chinese Academy of Sciences / School of Cyber Security, University of Chinese Academy of Sciences

Zhanyi Wang
QIANXIN Group Inc.

Paper (ACM DL)

Slides