Supporting Efficient Family Joins for Big Data Tables via Multiple Freedom Family Index

The Hadoop/MapReduce framework has been widely utilized for processing big data. To overcome the limitations of existing work and meet the growing requirements of querying big data, this paper introduces novel join operations, called family joins, for HBase tables using their column families as join...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiang Zhu, Chao Zhu
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10855900/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Hadoop/MapReduce framework has been widely utilized for processing big data. To overcome the limitations of existing work and meet the growing requirements of querying big data, this paper introduces novel join operations, called family joins, for HBase tables using their column families as join keys. Family joins possess the closure property that is demanded by many big data applications. This work explores four types of family joins according to different types of freedom in prefix matching for join comparisons. Two approaches to processing such family joins are discussed. The first is the direct method, which is inspired by the straightforward nested-loop strategy. The second is an index-based method, which utilizes a special index for HBase tables. Detailed definitions, practical applications, and processing strategies and algorithms for family joins are provided. Experimental results demonstrate that the index-based join method is quite promising in efficiently processing family joins.
ISSN:2169-3536