Treffer: Integrating adaptive sampling with ensembles model for software defect prediction.
Weitere Informationen
Handling class imbalance is a challenge in software defect prediction. Imbalanced datasets can cause bias in machine learning models, hindering their ability to detect defects. This paper proposes an integration of Adaptive Synthetic Sampling (ADASYN) and ensemble learning methods to improve prediction accuracy. ADASYN enhances the handling of imbalanced data by generating synthetic samples for hard-to-classify instances. At the same time, the ensemble stacking technique leverages the strengths of multiple models to reduce bias and variance. The machine learning models used in this study are K-Nearest Neighbors (KNN), Decision Tree (DT), and Random Forest (RF). The results demonstrate that ADASYN, combined with ensemble stacking, outperforms the traditional SMOTE technique in most cases. For instance, in the Ant-1.7 dataset, ADASYN achieved a stacking accuracy of 90.60% compared to 89.32% with SMOTE. Similarly, in the Camel-1.6 dataset, ADASYN achieved 91.56%, slightly exceeding SMOTE's 91.32%. However, SMOTE performed better in simpler models like Decision Tree for certain datasets, highlighting the importance of choosing the appropriate resampling method. Across all datasets, ensemble stacking consistently provided the highest accuracy, benefiting from ADASYN's adaptive resampling strategy. These results underscore the importance of combining advanced sampling methods with ensemble learning techniques to address class imbalance effectively. This approach improves prediction accuracy and provides a practical framework for reliable software defect prediction in real-world scenarios. Future work will explore hybrid techniques and broader evaluations across diverse datasets and classifiers. [ABSTRACT FROM AUTHOR]
Copyright of Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics & Control is the property of Universitas Muhammadiyah Malang and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)