Mining Conditional Functional Dependency Rules on Big Data

Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representati...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingda Li, Hongzhi Wang, Jianzhong Li
Format: Article
Language:English
Published: Tsinghua University Press 2020-03-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2019.9020019
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572931748659200
author Mingda Li
Hongzhi Wang
Jianzhong Li
author_facet Mingda Li
Hongzhi Wang
Jianzhong Li
author_sort Mingda Li
collection DOAJ
description Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representative training set. We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data. We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms. Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period.
format Article
id doaj-art-2b5608af9cab40fbabc2e3e5af138976
institution Kabale University
issn 2096-0654
language English
publishDate 2020-03-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-2b5608af9cab40fbabc2e3e5af1389762025-02-02T05:59:19ZengTsinghua University PressBig Data Mining and Analytics2096-06542020-03-0131688410.26599/BDMA.2019.9020019Mining Conditional Functional Dependency Rules on Big DataMingda Li0Hongzhi Wang1Jianzhong Li2<institution content-type="dept">Department of Computer Science</institution>, <institution>University of California</institution>, <city>Los Angles</city>, <state>CA</state> <postal-code>90095</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Harbin Institute of Technology</institution>, <city>Harbin</city> <postal-code>150000</postal-code>, <country>China</country>.<institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Harbin Institute of Technology</institution>, <city>Harbin</city> <postal-code>150000</postal-code>, <country>China</country>.Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representative training set. We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data. We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms. Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period.https://www.sciopen.com/article/10.26599/BDMA.2019.9020019data miningconditional functional dependencybig datadata quality
spellingShingle Mingda Li
Hongzhi Wang
Jianzhong Li
Mining Conditional Functional Dependency Rules on Big Data
Big Data Mining and Analytics
data mining
conditional functional dependency
big data
data quality
title Mining Conditional Functional Dependency Rules on Big Data
title_full Mining Conditional Functional Dependency Rules on Big Data
title_fullStr Mining Conditional Functional Dependency Rules on Big Data
title_full_unstemmed Mining Conditional Functional Dependency Rules on Big Data
title_short Mining Conditional Functional Dependency Rules on Big Data
title_sort mining conditional functional dependency rules on big data
topic data mining
conditional functional dependency
big data
data quality
url https://www.sciopen.com/article/10.26599/BDMA.2019.9020019
work_keys_str_mv AT mingdali miningconditionalfunctionaldependencyrulesonbigdata
AT hongzhiwang miningconditionalfunctionaldependencyrulesonbigdata
AT jianzhongli miningconditionalfunctionaldependencyrulesonbigdata