Treffer: Fusion features for optimizing software defect detection using a graph neural network algorithm.
Weitere Informationen
In cross-project software defect detection, traditional methods struggle to accurately capture the complex structural relationships and dynamic evolution characteristics of code. This work proposes that constructing a dynamic heterogeneous graph model integrating multidimensional information enables a software defect detection system based on a Graph Neural Network (GNN) to significantly improve detection accuracy and cross-project generalization. To validate this hypothesis, a model named Dynamic Heterogeneous Graph Defect Detection (DGDefect) is designed and implemented. First, a dual-layer heterogeneous graph structure is built based on the Abstract Syntax Tree (AST) and the Program Dependence Graph (PDG). Additionally, a Dynamic Edge Weight Assignment (DEWA) algorithm is introduced to dynamically compute edge weights according to node attributes and contextual similarity. Next, a Gated Graph Attention Network performs gated fusion of syntactic features from AST nodes, control flow features from PDG nodes, and developer commit behavior features. A hierarchical attention mechanism-comprising node-level, subgraph-level, and global-level attention-is integrated within the GNN framework, along with a subgraph pattern matching strategy to identify defect propagation paths. Finally, a resilient incremental learning framework is developed, significantly enhancing model update efficiency through parameter freezing and knowledge distillation. Experiments conducted on the NASA software defect prediction dataset and three large-scale open-source industrial projects demonstrate that DGDefect achieves an average F1 score of 85.5% in cross-project detection, with 89.7% for Java projects. The false positive rate (FPR) is reduced to 5.8%, and recall reaches 88.1%. In industrial-scale codebase detection, the model achieves an average true defect detection rate of 93.1%. With only 52.7 million parameters, the model is significantly smaller than CodeBERT. These results confirm the proposed method's advantages in accuracy, efficiency, generalization, and practical applicability. This work offers a theoretically grounded and engineering-feasible solution for GNN-based software defect detection. [ABSTRACT FROM AUTHOR]
Copyright of Intelligent Decision Technologies is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)