Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index

The Hadoop/MapReduce framework has been widely utilized for processing big data. To overcome the limitations of existing work and meet the growing requirements of querying big data, this paper introduces novel join operations, called family joins, for HBase tables using their column families as join...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiang Zhu, Chao Zhu
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10855900/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832087918085144576
author Qiang Zhu
Chao Zhu
author_facet Qiang Zhu
Chao Zhu
author_sort Qiang Zhu
collection DOAJ
description The Hadoop/MapReduce framework has been widely utilized for processing big data. To overcome the limitations of existing work and meet the growing requirements of querying big data, this paper introduces novel join operations, called family joins, for HBase tables using their column families as join keys. Family joins possess the closure property that is demanded by many big data applications. This work explores four types of family joins according to different types of freedom in prefix matching for join comparisons. Two approaches to processing such family joins are discussed. The first is the direct method, which is inspired by the straightforward nested-loop strategy. The second is an index-based method, which utilizes a special index for HBase tables. Detailed definitions, practical applications, and processing strategies and algorithms for family joins are provided. Experimental results demonstrate that the index-based join method is quite promising in efficiently processing family joins.
format Article
id doaj-art-1377227c580a4674be45528dbee43271
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-1377227c580a4674be45528dbee432712025-02-06T00:00:24ZengIEEEIEEE Access2169-35362025-01-0113217072172210.1109/ACCESS.2025.353569310855900Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family IndexQiang Zhu0https://orcid.org/0000-0001-7094-9236Chao Zhu1Department of Computer and Information Science, University of Michigan-Dearborn, Dearborn, MI, USADepartment of Computer and Information Science, University of Michigan-Dearborn, Dearborn, MI, USAThe Hadoop/MapReduce framework has been widely utilized for processing big data. To overcome the limitations of existing work and meet the growing requirements of querying big data, this paper introduces novel join operations, called family joins, for HBase tables using their column families as join keys. Family joins possess the closure property that is demanded by many big data applications. This work explores four types of family joins according to different types of freedom in prefix matching for join comparisons. Two approaches to processing such family joins are discussed. The first is the direct method, which is inspired by the straightforward nested-loop strategy. The second is an index-based method, which utilizes a special index for HBase tables. Detailed definitions, practical applications, and processing strategies and algorithms for family joins are provided. Experimental results demonstrate that the index-based join method is quite promising in efficiently processing family joins.https://ieeexplore.ieee.org/document/10855900/DatabaseHBasebig data processingfamily joinquery optimizationindex
spellingShingle Qiang Zhu
Chao Zhu
Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index
IEEE Access
Database
HBase
big data processing
family join
query optimization
index
title Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index
title_full Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index
title_fullStr Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index
title_full_unstemmed Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index
title_short Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index
title_sort supporting efficient family joins for big data tables via multiple freedom family index
topic Database
HBase
big data processing
family join
query optimization
index
url https://ieeexplore.ieee.org/document/10855900/
work_keys_str_mv AT qiangzhu supportingefficientfamilyjoinsforbigdatatablesviamultiplefreedomfamilyindex
AT chaozhu supportingefficientfamilyjoinsforbigdatatablesviamultiplefreedomfamilyindex