Full Program »
ICSD: An Automatic System for Insecure Code Snippet Detection in Stack Overflow over Heterogeneous Information Network
As the popularity of modern social coding paradigm such as Stack Overflow grows, its potential security risks increase as well (e.g., insecure codes could be easily embedded and distributed). To address this largely overlooked issue, in this paper, we bring an important new insight to exploit social coding properties in addition to code content for automatic detection of insecure code snippets in Stack Overflow. To determine if the given code snippets are insecure, we not only analyze the code content, but also utilize various kinds of relations among users, badges, questions, answers, code snippets and keywords in Stack Overflow. To model the rich semantic relationships, we first introduce a structured heterogeneous information network (HIN) for representation and then use meta-path based approach to incorporate higher-level semantics to build up relatedness over code snippets. Later, we propose a novel network embedding model named snippet2vec for representation learning in HIN where both the HIN structures and semantics are maximally preserved. After that, a multi-view fusion classifier is constructed for insecure code snippet detection. To the best of our knowledge, this is the first work utilizing both code content and social coding properties to address the code security issues in modern software coding platforms. Comprehensive experiments on the data collections from Stack Overflow are conducted to validate the effectiveness of the developed system ICSD which integrates our proposed method in insecure code snippet detection by comparisons with alternative approaches.