Treffer: Revealing top-k dominant individuals in incomplete data based on spark environment.
Weitere Informationen
Incomplete data set is a new type of data set that arises due to various reasons. For example, when performing data transmission, some data are lost due to abnormal signal interruptions; when acquiring gene expression profile data, dust on gene chips and other reasons can also lead to the final acquired data being incomplete. Top-k dominance (TKD) query returns the k data with the largest dominance score in a given dataset. For large scale incomplete datasets with missing data in unknown dimensions, most of the research is based on the Hadoop MapReduce framework, but the algorithm performance is poor because the Hadoop MapReduce computing framework is not good at multi-task iterative computing and has a long start-up time, etc. The Spark framework is a more efficient data processing framework with a rich computational model and in-memory based implementation of data processing. Based on the above analysis, this paper proposes a query algorithm (Spark_TKD) based on Spark framework, which designs a simple object dominating number calculation method, greatly reducing the computational complexity and the interaction of data between cluster nodes, and reducing disk I/O operations. At the end of the paper, comparison experiments are conducted using real and synthetic datasets, and the experimental results show that our proposed algorithm exhibits better performance in terms of time consumption and disk footprint. [ABSTRACT FROM AUTHOR]
Copyright of Environment, Development & Sustainability is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)