Treffer: A Comparative Hybrid Approach for Python Bug Detection Using Syntactic Features, Random Forest, and Neural Network.
Weitere Informationen
As software systems become increasingly complex, detecting bugs in source code has become a critical challenge in software development and maintenance. However, manual debugging is time-consuming and error-prone, prompting the need for automated bug detection solutions. The research explores the use of machine learning models, specifically, Random Forest and Neural Network, for identifying bugs in Python source code. Features are extracted using Abstract Syntax Trees (ASTs), which enable the structured parsing of syntactic elements such as functions, classes, variables, conditionals, and exception blocks. These features serve as input to train both models for binary classification: distinguishing between buggy and non-buggy code files. Both buggy and non-buggy code files have 200 Python scripts. The models are evaluated using accuracy, confusion matrices, Receiver Operating Characteristic (ROC) curves, and classification reports across multiple training epochs. Experimental results show that the Random Forest model achieves stable performance with an accuracy of 86.67% and an Area Under the Curve (AUC) score of 0.97 on the testing set, without significant improvement across epochs. In contrast, the Neural Network demonstrates gradual accuracy improvement from 68.33% at epoch 5 to 85% at epoch 300, along with higher sensitivity in bug detection, although it requires longer training times. Additionally, both models are used to predict specific lines of code containing potential bugs. Based on these findings, the choice of model depends on the application context. Random Forest offers faster deployment and consistent performance, while Neural Networks provide better adaptability to complex patterns and improved accuracy with sufficient training. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Communication & Information Technology (CommIT Journal) is the property of Journal of Communication & Information Technology (CommIT Journal) and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)