Treffer: Adaptive Dragonfly Optimization (Ado) Feature Selection Model and Distributed Bayesian Matrix Decomposition for Big Data Analytics

Title:
Adaptive Dragonfly Optimization (Ado) Feature Selection Model and Distributed Bayesian Matrix Decomposition for Big Data Analytics
Source:
International Journal of Intelligent Systems and Applications in Engineering; Vol. 12 No. 21s (2024); 962-973
Publisher Information:
International Journal of Intelligent Systems and Applications in Engineering, 2024.
Publication Year:
2024
Document Type:
Fachzeitschrift Article
File Description:
application/pdf
Language:
English
ISSN:
2147-6799
Rights:
CC BY SA
Accession Number:
edsair.issn21476799..357615f628337cb27b812bad5f56dce6
Database:
OpenAIRE

Weitere Informationen

Matrix decompositions are fundamental methods for extracting knowledge from large data sets produced by contemporary applications. Processing extremely large amounts of data using single machines are still inefficient or impractical. Distributed matrix decompositions are necessary and practical tools for big data analytics where high dimensionalities and complexities of large datasets hinder the data mining processes. Current approaches consume more execution time making it imperative to reduce dataset feature counts in processing. This work presents a novel wrapper feature selection method utilising Adaptive Dragonfly Optimisation (ADO) algorithm for making the search space more appropriate for feature selections. ADO was used to transform continuous vector search spaces into their binary representations. Distributed Bayesian Matrix Decomposition (DBMD) model is presented for clustering and mining voluminous data. This work specifically uses, 1) accelerated gradient descent, 2) alternate direction method of multipliers (ADMM), and 3) statistical inferences to model distributed computing. These algorithms' theoretical convergence behaviours are examined where tests reveal that the suggested algorithms perform better or on par with two common distributed approaches. The methods also scale up effectively to large data sets. Clustering performances are assessed using the metrics of precision, recall, F-measure, and Rand Index (RI), which are better suited for imbalanced classes.