Treffer: A Comparison of Hadoop Tools for Analyzing Tabular Data.

Title:
A Comparison of Hadoop Tools for Analyzing Tabular Data.
Authors:
Tomašić, Ivan1 ivan.tomasic@ijs.si, Rashkovska, Aleksandra1 aleksandra.rashkovska@ijs.si, Depolli, Matjaž1 matjaz.depolli@ijs.si, Trobec, Roman1 roman.trobec@ijs.si
Source:
Informatica (03505596). Jun2013, Vol. 37 Issue 2, p131-138. 8p.
Database:
Supplemental Index

Weitere Informationen

The paper describes the application of Hadoop modules: MapReduce, Pig and Hive, for processing and analyzing large amounts of tabular data acquired from a computer simulation of heat transfer in bio tissues. The Apache Hadoop is an open source environment for storing and analyzing BigData. It was installed on a cluster of six computing nodes, each with four cores. The implemented MapReduce job pipeline is described and the essential Java code segments are presented. The Java implementation employing MapReduce is compared to the Pig and Hive implementations regarding execution time and programming overhead. The experimental measurements of execution times of the employed parallel MapReduce tasks on 24 processor cores result in a speedup of 20, relative to the sequential execution, which indicates that a high level of parallelism is achieved. Furthermore, our test cases confirm that the direct employment of MapReduce in Java outperforms Pig and Hive by more than two times, while Hive being 20% faster than Pig. Still, Pig and Hive remain suitable and convenient alternatives for efficient operations on large data sets. [ABSTRACT FROM AUTHOR]

Prispevek opisuje uporabo Hadoop programskih modulov: MapReduce, Pig in Hive za procesiranje in analizo tabelaričnih podatkov o prenosu toplote v tkivih. [ABSTRACT FROM AUTHOR]