PSP-Mal: Evading Malware Detection via Prioritized Experience-based Reinforcement Learning with Shapley Prior
With the widespread application of machine learning techniques in malware detection, researchers have proposed various adversarial attack methods to generate adversarial examples (AEs) of malware, thereby evading detection. Previous studies have shown that the reinforcement learning (RL) framework can enable black-box attacks by performing a sequence of function-preserving operations, which produces functional evasive malware samples. However, it is difficult to obtain the useful guidance and feedbacks from the environment for agent training in the black-box scenario, which results in the RL framework being unable to learn the effective evasion policy. In this paper, we propose the Shapley prior and establish a prior-guidance-based RL framework, namely PSP-Mal, to generate AEs against Portable Executable (PE) malware detectors. Our framework improves on existing methods in three aspects: 1) We explore feature effects of the black-box model by computing Shapley values and further propose the Shapley prior to represent the expected impact of operations. 2) A novel prioritized experience utilization mechanism is established regarding the Shapley prior guidance in the RL framework. 3) The actions are expanded into item-content pairs and we use the Thompson sampling to choose effective content, which helps to reduce randomness and ensure repeatability. We compare the attack performance of our framework with other methods, and experimental results demonstrate that our algorithm is more effective. The evasion rates of PSP-Mal against the LightGBM models trained on EMBER and SOREL-20M reach 76.88% and 72.03%, respectively.