Improving Intrusion Detectors by Crook-sourcing

Conventional cyber defenses typically respond to detected attacks by rejecting them as quickly and decisively as possible; but aborted attacks are missed learning opportunities for intrusion detection. A method of reimagining cyber attacks as free sources of live training data for machine learning-based intrusion detection systems (IDSes) is proposed and evaluated. Rather than aborting attacks against legitimate services, adversarial interactions are selectively prolonged to maximize the defender's harvest of useful threat intelligence. Enhancing web services with deceptive attack-responses in this way is shown to be a powerful and practical strategy for improved detection, addressing several perennial challenges for machine learning-based IDS in the literature, including scarcity of training data, the high labeling burden for (semi-)supervised learning, encryption opacity, and concept differences between honeypot attacks and those against genuine services. By reconceptualizing software security patches as feature extraction engines, the approach conscripts attackers as free penetration testers, and coordinates multiple levels of the software stack to achieve fast, automatic, and accurate labeling of live web data streams.

Prototype implementations are showcased for two feature set models to extract security-relevant network- and system-level features from servers hosting enterprise-grade web applications. The evaluation demonstrates that the extracted data can be fed back into a network-level IDS for exceptionally accurate, yet lightweight attack detection.

Frederico Araujo
IBM Research

Gbadebo Ayoade
The University of Texas Dallas

Khaled Al-Naami
The University of Texas at Dallas

Yang Gao
The University of Texas at Dallas

Kevin W. Hamlen
The University of Texas at Dallas

Latifur Khan
The University of Texas at Dallas