Treffer: Classification of breast cancer using ensemble machine learning with apache spark.
Weitere Informationen
Breast cancer is one of the most common and serious problem affecting people around the world. Detecting it early and correctly identifying whether a tumor is benign or malignant. In this study, we developed a new model called the Logistic Ensemble Fusion Model to improve the accuracy of Breast cancer diagnosis. This model combines the strengths of three different machine learning models, specifically Support Vector Machine, Decision Tree, and Logistic Regression, into a powerful ensemble approach, significantly improving over traditional methods. We used Apache Spark with its Python API to handle large datasets quickly and efficiently. To select the important features for making predictions, we used a method called Recursive Feature Elimination (RFE), with the help of both a Support Vector Machine (SVM-RFE) and Random Forest (RF-RFE). We tested our model by dividing the data into training and testing sets in an 80:20 ratio. The Logistic Ensemble Fusion Model achieved an accuracy of 99.13%, precision of 98.71%, recall of 99.91%, and an F1 score of 99.12%. The entire process, which involved running 12 Spark jobs, was completed in 38 seconds. Compared to other models like Random Forest, Gradient Boosting, Factorization Machine, One-vs-Rest, and Multilayer Perceptron. The main innovation of this study is the use of multiple machine learning models in a unified ensemble fusion approach, providing classification performance and demonstrating significant advancement over previous methods. This study underscores the potential of advanced ensemble machine learning techniques and big data technologies in refining breast cancer diagnosis and supporting more effective clinical decision-making. [ABSTRACT FROM AUTHOR]
Copyright of Sigma: Journal of Engineering & Natural Sciences / Mühendislik ve Fen Bilimleri Dergisi is the property of Sigma: Journal of Engineering & Natural Sciences / Mühendislik ve Fen Bilimleri Dergisi and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)