Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink

Multiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, d...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiao-Yan Gao, Radhya Sahal, Gui-Xiu Chen, Mohammed H. Khafagy, Fatma A. Omara
Format: Article
Language:English
Published: Wiley 2020-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2020/6617149
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832547056052338688
author Xiao-Yan Gao
Radhya Sahal
Gui-Xiu Chen
Mohammed H. Khafagy
Fatma A. Omara
author_facet Xiao-Yan Gao
Radhya Sahal
Gui-Xiu Chen
Mohammed H. Khafagy
Fatma A. Omara
author_sort Xiao-Yan Gao
collection DOAJ
description Multiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, different design principles (i.e., in-memory Big Data platforms, Flink) are not considered. To bridge the gap of not considering the optimization of Big Data platforms, an end-to-end multiway join over Flink, which is called Join-MOTH system (J-MOTH), is proposed to exploit sharing data granularity, sharing join granularity, and sharing implicit sorts within multiple join queries. For sharing data, our previous work, Multiquery Optimization using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of sharing data opportunities among multiple queries. For sharing sort, our previous work, Sort-Based Optimizer for Big Data Multiquery (SOOM), has been introduced to consider the implicit sorts among join queries. For sharing join, additional modules have been tailored to the J-MOTH optimizer to optimize sharing work by exploiting shared pipelined multiway join among multiple multiway join queries. The experimental evaluation has demonstrated that the J-MOTH system outperforms the naive and the state-of-the-art techniques by 44% for query execution time using TPC-H queries. Also, the proposed J-MOTH system introduces maximal intermediate data size reduction by 30% in average over Hadoop-like infrastructures.
format Article
id doaj-art-ffe0e712f28a4523b6c777fbdfc2ceb8
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2020-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-ffe0e712f28a4523b6c777fbdfc2ceb82025-02-03T06:46:21ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/66171496617149Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with FlinkXiao-Yan Gao0Radhya Sahal1Gui-Xiu Chen2Mohammed H. Khafagy3Fatma A. Omara4School of Mathematics and Statistics, Yulin University, Yulin 719000, ChinaFaculty of Computers and Information, Cairo University, Cairo, EgyptSchool of Mathematics and Statistics, Qinghai Normal University, 810008 Xining, ChinaFaculty of Computers and Information, Fayoum University, Fayoum, EgyptFaculty of Computers and Information, Cairo University, Cairo, EgyptMultiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, different design principles (i.e., in-memory Big Data platforms, Flink) are not considered. To bridge the gap of not considering the optimization of Big Data platforms, an end-to-end multiway join over Flink, which is called Join-MOTH system (J-MOTH), is proposed to exploit sharing data granularity, sharing join granularity, and sharing implicit sorts within multiple join queries. For sharing data, our previous work, Multiquery Optimization using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of sharing data opportunities among multiple queries. For sharing sort, our previous work, Sort-Based Optimizer for Big Data Multiquery (SOOM), has been introduced to consider the implicit sorts among join queries. For sharing join, additional modules have been tailored to the J-MOTH optimizer to optimize sharing work by exploiting shared pipelined multiway join among multiple multiway join queries. The experimental evaluation has demonstrated that the J-MOTH system outperforms the naive and the state-of-the-art techniques by 44% for query execution time using TPC-H queries. Also, the proposed J-MOTH system introduces maximal intermediate data size reduction by 30% in average over Hadoop-like infrastructures.http://dx.doi.org/10.1155/2020/6617149
spellingShingle Xiao-Yan Gao
Radhya Sahal
Gui-Xiu Chen
Mohammed H. Khafagy
Fatma A. Omara
Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink
Complexity
title Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink
title_full Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink
title_fullStr Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink
title_full_unstemmed Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink
title_short Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink
title_sort exploiting sharing join opportunities in big data multiquery optimization with flink
url http://dx.doi.org/10.1155/2020/6617149
work_keys_str_mv AT xiaoyangao exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink
AT radhyasahal exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink
AT guixiuchen exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink
AT mohammedhkhafagy exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink
AT fatmaaomara exploitingsharingjoinopportunitiesinbigdatamultiqueryoptimizationwithflink