Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment

The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of script...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mikołaj Skrzypczyński, Piotr Muryjas
Format:	Article
Language:	English
Published:	Lublin University of Technology 2024-03-01
Series:	Journal of Computer Sciences Institute
Subjects:	data processing Apache Hive Apache Pig Hadoop
Online Access:	https://ph.pollub.pl/index.php/jcsi/article/view/4060
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832570014468669440
author	Mikołaj Skrzypczyński Piotr Muryjas
author_facet	Mikołaj Skrzypczyński Piotr Muryjas
author_sort	Mikołaj Skrzypczyński
collection	DOAJ
description	The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of scripts and queries destined for Apache Hive and Apache Pig, and then executed 10 times on environment brought by created virtual machine. Those methods were performed on the same data sets for 16 times according to previously prepared research scenarios. As the conclusion, authors had observed that Apache Hive is more efficient tool, than Apache Pig.
format	Article
id	doaj-art-8bd9ef62afe84fdb8f64da385b9152df
institution	Kabale University
issn	2544-0764
language	English
publishDate	2024-03-01
publisher	Lublin University of Technology
record_format	Article
series	Journal of Computer Sciences Institute
spelling	doaj-art-8bd9ef62afe84fdb8f64da385b9152df2025-02-02T18:02:59ZengLublin University of TechnologyJournal of Computer Sciences Institute2544-07642024-03-013010.35784/jcsi.4060Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environmentMikołaj Skrzypczyński0Piotr Muryjas1Lublin University of TechnologyLublin University of Technology The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of scripts and queries destined for Apache Hive and Apache Pig, and then executed 10 times on environment brought by created virtual machine. Those methods were performed on the same data sets for 16 times according to previously prepared research scenarios. As the conclusion, authors had observed that Apache Hive is more efficient tool, than Apache Pig. https://ph.pollub.pl/index.php/jcsi/article/view/4060data processingApache HiveApache PigHadoop
spellingShingle	Mikołaj Skrzypczyński Piotr Muryjas Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment Journal of Computer Sciences Institute data processing Apache Hive Apache Pig Hadoop
title	Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_full	Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_fullStr	Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_full_unstemmed	Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_short	Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_sort	analysis of data processing efficiency with use of apache hive and apache pig in hadoop environment
topic	data processing Apache Hive Apache Pig Hadoop
url	https://ph.pollub.pl/index.php/jcsi/article/view/4060
work_keys_str_mv	AT mikołajskrzypczynski analysisofdataprocessingefficiencywithuseofapachehiveandapachepiginhadoopenvironment AT piotrmuryjas analysisofdataprocessingefficiencywithuseofapachehiveandapachepiginhadoopenvironment

Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment

Similar Items