Treffer: Advanced code slicing with pre-trained model fine-tuned for open-source component malware detection.

Title:
Advanced code slicing with pre-trained model fine-tuned for open-source component malware detection.
Source:
Computer Journal; Sep2025, Vol. 68 Issue 9, p1163-1180, 18p
Database:
Complementary Index

Weitere Informationen

Open Source Software (OSS) is an essential part of modern software development, with platforms such as PyPI for Python, NPM for JavaScript, and RubyGems for Ruby facilitating code sharing and reuse. However, these repositories also pose significant security risks due to potential software supply chain attacks, where payloads are injected into components, propagating threats to downstream users and critical infrastructure. Existing automatic malicious component detection tools, particularly for PyPI, struggle to distinguish between subtle differences in malicious and benign behaviors, leading to high false positive rates. To address these issues, we systematically compare and explore these subtle differences, offering a more refined and accurate detection method, Open-Source Component Code Slices BERT (OCS-BERT). OCS-BERT leverages taint-based program slicing to isolate sensitive behavior segments and fine-tunes pre-trained model to capture subtle semantic differences across programming languages. This system excels in detecting malicious Python components and exhibits encouraging cross-language transferability to JavaScript's NPM and Ruby's RubyGems. Additionally, OCS-BERT successfully detected 107 malicious components from a total of 25,759 newly-uploaded PyPI components, taking two weeks to complete the process. This achievement demonstrates the effectiveness of our method, which serves as a potent enhancement to the current repertoire of software supply chain detection methodologies. [ABSTRACT FROM AUTHOR]

Copyright of Computer Journal is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)