LotusSQL: SQL Engine for High-Performance Big Data Systems
In recent years, Apache Spark has become the de facto standard for big data processing. SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language (SQL). SparkSQL provides convenient data processing interfaces. Despite its efficient optimizer, SparkSQL stil...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2021-12-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2021.9020009 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832573648676847616 |
---|---|
author | Xiaohan Li Bowen Yu Guanyu Feng Haojie Wang Wenguang Chen |
author_facet | Xiaohan Li Bowen Yu Guanyu Feng Haojie Wang Wenguang Chen |
author_sort | Xiaohan Li |
collection | DOAJ |
description | In recent years, Apache Spark has become the de facto standard for big data processing. SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language (SQL). SparkSQL provides convenient data processing interfaces. Despite its efficient optimizer, SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization. Adopting native languages such as C++ could help to avoid such bottlenecks. Benefiting from a bare-metal runtime environment and template usage, systems with C++ interfaces usually achieve superior performance. However, the complexity of native languages also increases the required programming and debugging efforts. In this work, we present LotusSQL, an engine to provide SQL support for dataset abstraction on a native backend Lotus. We employ a convenient SQL processing framework to deal with frontend jobs. Advanced query optimization technologies are added to improve the quality of execution plans. Above the storage design and user interface of the compute engine, LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend. Evaluation results show that LotusSQL achieves a speedup of up to 9× in certain queries and outperforms Spark SQL in a standard query benchmark by more than 2× on average. |
format | Article |
id | doaj-art-cb6047f2b47644b58c926f8da2b81947 |
institution | Kabale University |
issn | 2096-0654 |
language | English |
publishDate | 2021-12-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj-art-cb6047f2b47644b58c926f8da2b819472025-02-02T03:45:09ZengTsinghua University PressBig Data Mining and Analytics2096-06542021-12-014425226510.26599/BDMA.2021.9020009LotusSQL: SQL Engine for High-Performance Big Data SystemsXiaohan Li0Bowen Yu1Guanyu Feng2Haojie Wang3Wenguang Chen4<institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <country>China</country>In recent years, Apache Spark has become the de facto standard for big data processing. SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language (SQL). SparkSQL provides convenient data processing interfaces. Despite its efficient optimizer, SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization. Adopting native languages such as C++ could help to avoid such bottlenecks. Benefiting from a bare-metal runtime environment and template usage, systems with C++ interfaces usually achieve superior performance. However, the complexity of native languages also increases the required programming and debugging efforts. In this work, we present LotusSQL, an engine to provide SQL support for dataset abstraction on a native backend Lotus. We employ a convenient SQL processing framework to deal with frontend jobs. Advanced query optimization technologies are added to improve the quality of execution plans. Above the storage design and user interface of the compute engine, LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend. Evaluation results show that LotusSQL achieves a speedup of up to 9× in certain queries and outperforms Spark SQL in a standard query benchmark by more than 2× on average.https://www.sciopen.com/article/10.26599/BDMA.2021.9020009big datac++structured query language (sql)query optimization |
spellingShingle | Xiaohan Li Bowen Yu Guanyu Feng Haojie Wang Wenguang Chen LotusSQL: SQL Engine for High-Performance Big Data Systems Big Data Mining and Analytics big data c++ structured query language (sql) query optimization |
title | LotusSQL: SQL Engine for High-Performance Big Data Systems |
title_full | LotusSQL: SQL Engine for High-Performance Big Data Systems |
title_fullStr | LotusSQL: SQL Engine for High-Performance Big Data Systems |
title_full_unstemmed | LotusSQL: SQL Engine for High-Performance Big Data Systems |
title_short | LotusSQL: SQL Engine for High-Performance Big Data Systems |
title_sort | lotussql sql engine for high performance big data systems |
topic | big data c++ structured query language (sql) query optimization |
url | https://www.sciopen.com/article/10.26599/BDMA.2021.9020009 |
work_keys_str_mv | AT xiaohanli lotussqlsqlengineforhighperformancebigdatasystems AT bowenyu lotussqlsqlengineforhighperformancebigdatasystems AT guanyufeng lotussqlsqlengineforhighperformancebigdatasystems AT haojiewang lotussqlsqlengineforhighperformancebigdatasystems AT wenguangchen lotussqlsqlengineforhighperformancebigdatasystems |