Treffer: Software defect prediction via ensemble of convolutional neural network and recurrent neural network

Title:
Software defect prediction via ensemble of convolutional neural network and recurrent neural network
Source:
Scientific Bulletin of UNFU; Vol 31 No 2 (2021): Scientific Bulletin of UNFU; 114-120; 2519-2477; 1994-7836; 10.36930/403102
Publisher Information:
Ukrainian National Forestry University 2021-04-29
Document Type:
E-Ressource Electronic Resource
Availability:
Open access content. Open access content
http://creativecommons.org/licenses/by/4.0
Note:
Ukrainian
Other Numbers:
UAUNF oai:ojs.tour.dp.ua:article/2290
10.36930/40310219
1257652485
Contributing Source:
UKRAINIAN NAT FORESTRY UNIV
From OAIster®, provided by the OCLC Cooperative.
Accession Number:
edsoai.on1257652485
Database:
OAIster

Weitere Informationen

The paper is devoted to the study of the software defect prediction process using deep learning algorithms. This process consists of several main steps: dataset search and preparation, parsing source code into abstract syntactic tree, tree traversal and tokens mapping, handling class imbalance, building and training neural networks. During the analysis of research papers we found that the application of the software defect prediction could facilitate the defects searching and prioritize testing efforts. However, machine learning algorithms, demonstrated in the recent studies, are not effective enough, showing an unstable accuracy ranging from 40 % to 60 %. The study has discovered that the application of deep learning algorithms gives more accurate results than the other machine learning algorithms. In particular, the state-of-the-art versions of the CNN and RNN are on average 12-30 % more accurate than traditional algorithms such as decision tree, logistic regression, naive Bayesian classifier, and random forest. However, the results still remain two variants for the various software projects. In order to reduce the variance and increase prediction accuracy, the study proposed improvements on the step of building and training neural networks. Namely, we proposed an improved model for software defect prediction based on the combination of state-of-the-art deep learning algorithms CNN and RNN together with binary classifier logistic regression. In order to correctly evaluate the accuracy of the proposed model, CNN and RNN based models were also built and trained. Training of each of the models was conducted on a dataset of 50,000 source code files obtained from 13 java projects. Experimental results showed that the CNN+RNN model gave on average 10-9 % higher accuracy than RNN and 2 % higher accuracy than CNN. The accuracy results by each of the analysed software projects showed that in 11 out of 13 software projects the CN