Result: An Algorithm to Optimize Frequent Pattern Mining in Parallel and Distributed Environment

Title:
An Algorithm to Optimize Frequent Pattern Mining in Parallel and Distributed Environment
Source:
Engineering, Technology & Applied Science Research. 15:22252-22256
Publisher Information:
Engineering, Technology & Applied Science Research, 2025.
Publication Year:
2025
Document Type:
Academic journal Article
ISSN:
1792-8036
2241-4487
DOI:
10.48084/etasr.9830
Rights:
CC BY
Accession Number:
edsair.doi...........6ed79f884519d45ba60051d569b079bc
Database:
OpenAIRE

Further Information

Frequent Pattern Mining (FPM) is an important data mining task that involves identifying recurrent patterns or correlations in datasets. The main purpose of FPM algorithms is to find sets of items that frequently appear in transactional or relational databases. This study presents a Parallel and Distributed Recursive Elimination (PDReLim) algorithm, a novel FPM technique designed for parallel computing to improve efficiency compared to existing parallel FPM algorithms. PDReLim recursively deletes infrequent items on each node while using the capabilities of parallel and distributed systems or clusters. Its performance was evaluated on well-known datasets, namely Chess, Mushroom, and Connect, available in the UCI repository, with a focus on the lowest support threshold, which causes computational bottlenecks for many FPM algorithms. PDReLim, implemented in PySpark, outperforms standard MapReduce for iterative algorithms. Spark's execution is optimized for large databases by utilizing its proficient capabilities, such as the RDD data structure, in-memory processing, and shared variables. The results show that PDReLim was significantly faster than PApriori, PFP-Growth, and PFP-Max.