Treffer: Java Source Code Vulnerability Detection Using Large Language Model.
Weitere Informationen
Vulnerability detection is one of the main focuses of the security domain. The rise of the Large Language Model (LLM) has shown promising performance compared to static code analysis and machine learning with neural architecture in vulnerability detection. These results motivate the researcher to dive deeper into the capabilities of large language models, exploring their potential to improve the performance of the existing method. This research explores vulnerability detection through the use of LLM model employing a combination of Abstract Syntax Tree (AST) and Code Property Graph (CPG) as the code representation at the granularity level. We investigate the effectiveness of concatenating both representations as well as utilizing them separately. We conducted an experiment using LLM that pre-trained with source code as well the model that trained with text corpus as a comparison. We compare the representation with the original source code in order to compare the significance of the contribution. The Software Assurance Reference Dataset (SARD) and CVEFixes are used as the dataset. The result reveals that the LLM pre-trained on a text corpus achieves effective vulnerability outcomes through combined representation, while the CPG representation delivers positive results for all pre-trained models that leverage source code. [ABSTRACT FROM AUTHOR]