Treffer: An Empirical Evaluation of Ensemble Models for Python Code Smell Detection.
Weitere Informationen
Code smells, which represent poor design choices or suboptimal code implementations, reduce software quality and hinder the code maintenance process. Detecting code smells is, therefore, essential during software development. This study introduces a Python-based code smell dataset targeting two smell types: Large Class and Long Method. Five ensemble learning methods—Bagging, Gradient Boost, Max Voting, AdaBoost, and XGBoost—were employed to detect code smells within these datasets. The ten most significant features were selected using the Chi-square feature selection technique. To address the class imbalance, the SMOTE algorithm was applied. Experimental results yielded a best accuracy score of 0.96 and an MCC of 0.85 for the Large Class dataset using the Max Voting model. For the Long Method dataset, a best accuracy score of 0.98 and an MCC of 0.94 were achieved using the Gradient Boost model in conjunction with Chi-square feature selection. These results highlight the effectiveness of the proposed methodology and its potential to enhance code smell detection in Python significantly, reinforcing confidence in the approach's thoroughness and applicability. [ABSTRACT FROM AUTHOR]
Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)