Mining Conditional Functional Dependency Rules on Big Data
Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representati...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2020-03-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2019.9020019 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832572931748659200 |
---|---|
author | Mingda Li Hongzhi Wang Jianzhong Li |
author_facet | Mingda Li Hongzhi Wang Jianzhong Li |
author_sort | Mingda Li |
collection | DOAJ |
description | Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representative training set. We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data. We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms. Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period. |
format | Article |
id | doaj-art-2b5608af9cab40fbabc2e3e5af138976 |
institution | Kabale University |
issn | 2096-0654 |
language | English |
publishDate | 2020-03-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj-art-2b5608af9cab40fbabc2e3e5af1389762025-02-02T05:59:19ZengTsinghua University PressBig Data Mining and Analytics2096-06542020-03-0131688410.26599/BDMA.2019.9020019Mining Conditional Functional Dependency Rules on Big DataMingda Li0Hongzhi Wang1Jianzhong Li2<institution content-type="dept">Department of Computer Science</institution>, <institution>University of California</institution>, <city>Los Angles</city>, <state>CA</state> <postal-code>90095</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Harbin Institute of Technology</institution>, <city>Harbin</city> <postal-code>150000</postal-code>, <country>China</country>.<institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Harbin Institute of Technology</institution>, <city>Harbin</city> <postal-code>150000</postal-code>, <country>China</country>.Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representative training set. We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data. We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms. Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period.https://www.sciopen.com/article/10.26599/BDMA.2019.9020019data miningconditional functional dependencybig datadata quality |
spellingShingle | Mingda Li Hongzhi Wang Jianzhong Li Mining Conditional Functional Dependency Rules on Big Data Big Data Mining and Analytics data mining conditional functional dependency big data data quality |
title | Mining Conditional Functional Dependency Rules on Big Data |
title_full | Mining Conditional Functional Dependency Rules on Big Data |
title_fullStr | Mining Conditional Functional Dependency Rules on Big Data |
title_full_unstemmed | Mining Conditional Functional Dependency Rules on Big Data |
title_short | Mining Conditional Functional Dependency Rules on Big Data |
title_sort | mining conditional functional dependency rules on big data |
topic | data mining conditional functional dependency big data data quality |
url | https://www.sciopen.com/article/10.26599/BDMA.2019.9020019 |
work_keys_str_mv | AT mingdali miningconditionalfunctionaldependencyrulesonbigdata AT hongzhiwang miningconditionalfunctionaldependencyrulesonbigdata AT jianzhongli miningconditionalfunctionaldependencyrulesonbigdata |