Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink
Multiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, d...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2020-01-01
|
Series: | Complexity |
Online Access: | http://dx.doi.org/10.1155/2020/6617149 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832547056052338688 |
---|---|
author | Xiao-Yan Gao Radhya Sahal Gui-Xiu Chen Mohammed H. Khafagy Fatma A. Omara |
author_facet | Xiao-Yan Gao Radhya Sahal Gui-Xiu Chen Mohammed H. Khafagy Fatma A. Omara |
author_sort | Xiao-Yan Gao |
collection | DOAJ |
description | Multiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, different design principles (i.e., in-memory Big Data platforms, Flink) are not considered. To bridge the gap of not considering the optimization of Big Data platforms, an end-to-end multiway join over Flink, which is called Join-MOTH system (J-MOTH), is proposed to exploit sharing data granularity, sharing join granularity, and sharing implicit sorts within multiple join queries. For sharing data, our previous work, Multiquery Optimization using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of sharing data opportunities among multiple queries. For sharing sort, our previous work, Sort-Based Optimizer for Big Data Multiquery (SOOM), has been introduced to consider the implicit sorts among join queries. For sharing join, additional modules have been tailored to the J-MOTH optimizer to optimize sharing work by exploiting shared pipelined multiway join among multiple multiway join queries. The experimental evaluation has demonstrated that the J-MOTH system outperforms the naive and the state-of-the-art techniques by 44% for query execution time using TPC-H queries. Also, the proposed J-MOTH system introduces maximal intermediate data size reduction by 30% in average over Hadoop-like infrastructures. |
format | Article |
id | doaj-art-ffe0e712f28a4523b6c777fbdfc2ceb8 |
institution | Kabale University |
issn | 1076-2787 1099-0526 |
language | English |
publishDate | 2020-01-01 |
publisher | Wiley |
record_format | Article |
series | Complexity |
spelling | doaj-art-ffe0e712f28a4523b6c777fbdfc2ceb82025-02-03T06:46:21ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/66171496617149Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with FlinkXiao-Yan Gao0Radhya Sahal1Gui-Xiu Chen2Mohammed H. Khafagy3Fatma A. Omara4School of Mathematics and Statistics, Yulin University, Yulin 719000, ChinaFaculty of Computers and Information, Cairo University, Cairo, EgyptSchool of Mathematics and Statistics, Qinghai Normal University, 810008 Xining, ChinaFaculty of Computers and Information, Fayoum University, Fayoum, EgyptFaculty of Computers and Information, Cairo University, Cairo, EgyptMultiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, different design principles (i.e., in-memory Big Data platforms, Flink) are not considered. To bridge the gap of not considering the optimization of Big Data platforms, an end-to-end multiway join over Flink, which is called Join-MOTH system (J-MOTH), is proposed to exploit sharing data granularity, sharing join granularity, and sharing implicit sorts within multiple join queries. For sharing data, our previous work, Multiquery Optimization using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of sharing data opportunities among multiple queries. For sharing sort, our previous work, Sort-Based Optimizer for Big Data Multiquery (SOOM), has been introduced to consider the implicit sorts among join queries. For sharing join, additional modules have been tailored to the J-MOTH optimizer to optimize sharing work by exploiting shared pipelined multiway join among multiple multiway join queries. The experimental evaluation has demonstrated that the J-MOTH system outperforms the naive and the state-of-the-art techniques by 44% for query execution time using TPC-H queries. Also, the proposed J-MOTH system introduces maximal intermediate data size reduction by 30% in average over Hadoop-like infrastructures.http://dx.doi.org/10.1155/2020/6617149 |
spellingShingle | Xiao-Yan Gao Radhya Sahal Gui-Xiu Chen Mohammed H. Khafagy Fatma A. Omara Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink Complexity |
title | Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink |
title_full | Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink |
title_fullStr | Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink |
title_full_unstemmed | Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink |
title_short | Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink |
title_sort | exploiting sharing join opportunities in big data multiquery optimization with flink |
url | http://dx.doi.org/10.1155/2020/6617149 |
work_keys_str_mv | AT xiaoyangao exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink AT radhyasahal exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink AT guixiuchen exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink AT mohammedhkhafagy exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink AT fatmaaomara exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink |