Mining Conditional Functional Dependency Rules on Big Data

Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representati...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingda Li, Hongzhi Wang, Jianzhong Li
Format: Article
Language:English
Published: Tsinghua University Press 2020-03-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2019.9020019
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representative training set. We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data. We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms. Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period.
ISSN:2096-0654