Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment
The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of script...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Lublin University of Technology
2024-03-01
|
Series: | Journal of Computer Sciences Institute |
Subjects: | |
Online Access: | https://ph.pollub.pl/index.php/jcsi/article/view/4060 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832570014468669440 |
---|---|
author | Mikołaj Skrzypczyński Piotr Muryjas |
author_facet | Mikołaj Skrzypczyński Piotr Muryjas |
author_sort | Mikołaj Skrzypczyński |
collection | DOAJ |
description |
The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of scripts and queries destined for Apache Hive and Apache Pig, and then executed 10 times on environment brought by created virtual machine. Those methods were performed on the same data sets for 16 times according to previously prepared research scenarios. As the conclusion, authors had observed that Apache Hive is more efficient tool, than Apache Pig.
|
format | Article |
id | doaj-art-8bd9ef62afe84fdb8f64da385b9152df |
institution | Kabale University |
issn | 2544-0764 |
language | English |
publishDate | 2024-03-01 |
publisher | Lublin University of Technology |
record_format | Article |
series | Journal of Computer Sciences Institute |
spelling | doaj-art-8bd9ef62afe84fdb8f64da385b9152df2025-02-02T18:02:59ZengLublin University of TechnologyJournal of Computer Sciences Institute2544-07642024-03-013010.35784/jcsi.4060Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environmentMikołaj Skrzypczyński0Piotr Muryjas1Lublin University of TechnologyLublin University of Technology The aim of this paper is the analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment. The analysis was based on comparison between both mentioned tools with use of large data set, represented by 28 million records. Research was provided with use of scripts and queries destined for Apache Hive and Apache Pig, and then executed 10 times on environment brought by created virtual machine. Those methods were performed on the same data sets for 16 times according to previously prepared research scenarios. As the conclusion, authors had observed that Apache Hive is more efficient tool, than Apache Pig. https://ph.pollub.pl/index.php/jcsi/article/view/4060data processingApache HiveApache PigHadoop |
spellingShingle | Mikołaj Skrzypczyński Piotr Muryjas Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment Journal of Computer Sciences Institute data processing Apache Hive Apache Pig Hadoop |
title | Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment |
title_full | Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment |
title_fullStr | Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment |
title_full_unstemmed | Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment |
title_short | Analysis of data processing efficiency with use of Apache Hive and Apache Pig in Hadoop environment |
title_sort | analysis of data processing efficiency with use of apache hive and apache pig in hadoop environment |
topic | data processing Apache Hive Apache Pig Hadoop |
url | https://ph.pollub.pl/index.php/jcsi/article/view/4060 |
work_keys_str_mv | AT mikołajskrzypczynski analysisofdataprocessingefficiencywithuseofapachehiveandapachepiginhadoopenvironment AT piotrmuryjas analysisofdataprocessingefficiencywithuseofapachehiveandapachepiginhadoopenvironment |