Treffer: Software defect prediction via ensemble of convolutional neural network and recurrent neural network
http://creativecommons.org/licenses/by/4.0
10.36930/40310219
1257652485
From OAIster®, provided by the OCLC Cooperative.
Weitere Informationen
The paper is devoted to the study of the software defect prediction process using deep learning algorithms. This process consists of several main steps: dataset search and preparation, parsing source code into abstract syntactic tree, tree traversal and tokens mapping, handling class imbalance, building and training neural networks. During the analysis of research papers we found that the application of the software defect prediction could facilitate the defects searching and prioritize testing efforts. However, machine learning algorithms, demonstrated in the recent studies, are not effective enough, showing an unstable accuracy ranging from 40 % to 60 %. The study has discovered that the application of deep learning algorithms gives more accurate results than the other machine learning algorithms. In particular, the state-of-the-art versions of the CNN and RNN are on average 12-30 % more accurate than traditional algorithms such as decision tree, logistic regression, naive Bayesian classifier, and random forest. However, the results still remain two variants for the various software projects. In order to reduce the variance and increase prediction accuracy, the study proposed improvements on the step of building and training neural networks. Namely, we proposed an improved model for software defect prediction based on the combination of state-of-the-art deep learning algorithms CNN and RNN together with binary classifier logistic regression. In order to correctly evaluate the accuracy of the proposed model, CNN and RNN based models were also built and trained. Training of each of the models was conducted on a dataset of 50,000 source code files obtained from 13 java projects. Experimental results showed that the CNN+RNN model gave on average 10-9 % higher accuracy than RNN and 2 % higher accuracy than CNN. The accuracy results by each of the analysed software projects showed that in 11 out of 13 software projects the CN