Treffer: Obfuscated Clone Search in JavaScript based on Reinforcement Subsequence Learning.
Weitere Informationen
Finding similar code is important for software engineering, defense of intellectual property, and security, and one of the increasingly common ways adversaries use to defeat the detection of similar code is through obfuscations such as code transformation and scattering the code they wish to hide among long sequences. Moving code far enough apart poses a specific challenge for solutions with localized features (e.g., n-grams), or attention mechanisms as the code parts are distributed beyond the local context window. We introduce a neural network solution pattern called "Cybertron" that addresses this problem by utilizing reinforcement learning to train a code abstraction and summarization function; this converts arbitrarily long code into fixed-length real vectors in a way that is optimized for similarity search. The key to the design is the smart selection of important elements of the code and abstraction to preserve semantic function while minimizing syntactic feature information. We evaluated the approach on a three-challenge benchmark of obfuscated JavaScript, a scripting language that is commonly obfuscated and for which code-mixing is a rising challenge. The evaluation shows our approach identifies obfuscated code within even large scripts with an AUC of 78%, which outperforms current state-of-the-art sequence models by 7–35%. [ABSTRACT FROM AUTHOR]
Copyright of ACM Transactions on Software Engineering & Methodology is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)