Treffer: Optimal Data File Allocation for All-to-All Comparison in Distributed System: A Case Study on Genetic Sequence Comparison.

Title:
Optimal Data File Allocation for All-to-All Comparison in Distributed System: A Case Study on Genetic Sequence Comparison.
Authors:
Li, L. X.1,2,3 llxhappy@126.com, Gao, J.1 gaojing@imau.edu.cn, Mu, R.4 568387304@qq.com
Source:
International Journal of Computers, Communications & Control. 2019, Vol. 14 Issue 2, p199-211. 13p. 6 Charts.
Database:
Supplemental Index

Weitere Informationen

In order to solve the problem of unbalanced load of data files in large-scale data all-to-all comparison under distributed system environment, the differences of files themselves are fully considered. This paper aims to fully utilize the advantages of distributed system to enhance the file allocation of all-to-all comparison between the data files in a large dataset. For this purpose, the author formally described the allto-all comparison problem, and constructed a data allocation model via mixed integer linear programming (MILP). Meanwhile, a data allocation algorithm was developed on the Matlab using the intlinprog function of branch-and-bound method. Finally, our model and algorithm were verified through several experiments. The results show that the proposed file allocation strategy can achieve the basic load balance of each node in the distributed system without exceeding the storage capacity of any node, and completely localize the data file. The research findings can be applied to such fields as bioinformatics, biometrics and data mining. [ABSTRACT FROM AUTHOR]