Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment

The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of script...

Full description

Saved in:
Bibliographic Details
Main Authors: Mikołaj Skrzypczyński, Piotr Muryjas
Format: Article
Language:English
Published: Lublin University of Technology 2024-03-01
Series:Journal of Computer Sciences Institute
Subjects:
Online Access:https://ph.pollub.pl/index.php/jcsi/article/view/4060
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832570014468669440
author Mikołaj Skrzypczyński
Piotr Muryjas
author_facet Mikołaj Skrzypczyński
Piotr Muryjas
author_sort Mikołaj Skrzypczyński
collection DOAJ
description The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of scripts and queries destined for Apache Hive and Apache Pig, and then executed 10 times on environment brought by created virtual machine. Those methods were performed on the same data sets for 16 times according to previously prepared research scenarios. As the conclusion, authors had observed that Apache Hive is more efficient tool, than Apache Pig.
format Article
id doaj-art-8bd9ef62afe84fdb8f64da385b9152df
institution Kabale University
issn 2544-0764
language English
publishDate 2024-03-01
publisher Lublin University of Technology
record_format Article
series Journal of Computer Sciences Institute
spelling doaj-art-8bd9ef62afe84fdb8f64da385b9152df2025-02-02T18:02:59ZengLublin University of TechnologyJournal of Computer Sciences Institute2544-07642024-03-013010.35784/jcsi.4060Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environmentMikołaj Skrzypczyński0Piotr Muryjas1Lublin University of TechnologyLublin University of Technology The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of scripts and queries destined for Apache Hive and Apache Pig, and then executed 10 times on environment brought by created virtual machine. Those methods were performed on the same data sets for 16 times according to previously prepared research scenarios. As the conclusion, authors had observed that Apache Hive is more efficient tool, than Apache Pig. https://ph.pollub.pl/index.php/jcsi/article/view/4060data processingApache HiveApache PigHadoop
spellingShingle Mikołaj Skrzypczyński
Piotr Muryjas
Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
Journal of Computer Sciences Institute
data processing
Apache Hive
Apache Pig
Hadoop
title Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_full Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_fullStr Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_full_unstemmed Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_short Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
title_sort analysis of data processing efficiency with use of apache hive and apache pig in hadoop environment
topic data processing
Apache Hive
Apache Pig
Hadoop
url https://ph.pollub.pl/index.php/jcsi/article/view/4060
work_keys_str_mv AT mikołajskrzypczynski analysisofdataprocessingefficiencywithuseofapachehiveandapachepiginhadoopenvironment
AT piotrmuryjas analysisofdataprocessingefficiencywithuseofapachehiveandapachepiginhadoopenvironment