Treffer: An Algorithm to Optimize Frequent Pattern Mining in Parallel and Distributed Environment.
Weitere Informationen
Frequent Pattern Mining (FPM) is an important data mining task that involves identifying recurrent patterns or correlations in datasets. The main purpose of FPM algorithms is to find sets of items that frequently appear in transactional or relational databases. This study presents a Parallel and Distributed Recursive Elimination (PDReLim) algorithm, a novel FPM technique designed for parallel computing to improve efficiency compared to existing parallel FPM algorithms. PDReLim recursively deletes infrequent items on each node while using the capabilities of parallel and distributed systems or clusters. Its performance was evaluated on well-known datasets, namely Chess, Mushroom, and Connect, available in the UCI repository, with a focus on the lowest support threshold, which causes computational bottlenecks for many FPM algorithms. PDReLim, implemented in PySpark, outperforms standard MapReduce for iterative algorithms. Spark's execution is optimized for large databases by utilizing its proficient capabilities, such as the RDD data structure, in-memory processing, and shared variables. The results show that PDReLim was significantly faster than PApriori, PFP-Growth, and PFP-Max. [ABSTRACT FROM AUTHOR]
Copyright of Engineering, Technology & Applied Science Research is the property of Engineering, Technology & Applied Science Research and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)