An Optimized Sanitization Approach for Minable Data Publication

Minable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks. Unfortunately, the minable data publication is often implemented by publishers with limited privacy concerns such that the pu...

Full description

Saved in:
Bibliographic Details
Main Authors: Fan Yang, Xiaofeng Liao
Format: Article
Language:English
Published: Tsinghua University Press 2022-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2022.9020007
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572940335448064
author Fan Yang
Xiaofeng Liao
author_facet Fan Yang
Xiaofeng Liao
author_sort Fan Yang
collection DOAJ
description Minable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks. Unfortunately, the minable data publication is often implemented by publishers with limited privacy concerns such that the published dataset is minable by malicious entities. It prohibits minable data publication since the published data may contain sensitive information. Thus, it is urgently demanded to present some approaches and technologies for reducing the privacy leakage risks. To this end, in this paper, we propose an optimized sanitization approach for minable data publication (named as SA-MDP). SA-MDP supports association rules mining function while providing privacy protection for specific rules. In SA-MDP, we consider the trade-off between the data utility and the data privacy in the minable data publication problem. To address this problem, SA-MDP designs a customized particle swarm optimization (PSO) algorithm, where the optimization objective is determined by both the data utility and the data privacy. Specifically, we take advantage of PSO to produce new particles, which is achieved by random mutation or learning from the best particle. Hence, SA-MDP can avoid the solutions being trapped into local optima. Besides, we design a proper fitness function to guide the particles to run towards the optimal solution. Additionally, we present a preprocessing method before the evolution process of the customized PSO algorithm to improve the convergence rate. Finally, the proposed SA-MDP approach is performed and verified over several datasets. The experimental results have demonstrated the effectiveness and efficiency of SA-MDP.
format Article
id doaj-art-43c9f6e945a44f36aaa33dc96a44aa6d
institution Kabale University
issn 2096-0654
language English
publishDate 2022-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-43c9f6e945a44f36aaa33dc96a44aa6d2025-02-02T06:14:03ZengTsinghua University PressBig Data Mining and Analytics2096-06542022-09-015325726910.26599/BDMA.2022.9020007An Optimized Sanitization Approach for Minable Data PublicationFan Yang0Xiaofeng Liao1College of Computer Science, Chongqing University, Chongqing 400044, ChinaCollege of Computer Science, Chongqing University, Chongqing 400044, ChinaMinable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks. Unfortunately, the minable data publication is often implemented by publishers with limited privacy concerns such that the published dataset is minable by malicious entities. It prohibits minable data publication since the published data may contain sensitive information. Thus, it is urgently demanded to present some approaches and technologies for reducing the privacy leakage risks. To this end, in this paper, we propose an optimized sanitization approach for minable data publication (named as SA-MDP). SA-MDP supports association rules mining function while providing privacy protection for specific rules. In SA-MDP, we consider the trade-off between the data utility and the data privacy in the minable data publication problem. To address this problem, SA-MDP designs a customized particle swarm optimization (PSO) algorithm, where the optimization objective is determined by both the data utility and the data privacy. Specifically, we take advantage of PSO to produce new particles, which is achieved by random mutation or learning from the best particle. Hence, SA-MDP can avoid the solutions being trapped into local optima. Besides, we design a proper fitness function to guide the particles to run towards the optimal solution. Additionally, we present a preprocessing method before the evolution process of the customized PSO algorithm to improve the convergence rate. Finally, the proposed SA-MDP approach is performed and verified over several datasets. The experimental results have demonstrated the effectiveness and efficiency of SA-MDP.https://www.sciopen.com/article/10.26599/BDMA.2022.9020007data publicationdata sanitizationassociation rules hidingevolutionary algorithm
spellingShingle Fan Yang
Xiaofeng Liao
An Optimized Sanitization Approach for Minable Data Publication
Big Data Mining and Analytics
data publication
data sanitization
association rules hiding
evolutionary algorithm
title An Optimized Sanitization Approach for Minable Data Publication
title_full An Optimized Sanitization Approach for Minable Data Publication
title_fullStr An Optimized Sanitization Approach for Minable Data Publication
title_full_unstemmed An Optimized Sanitization Approach for Minable Data Publication
title_short An Optimized Sanitization Approach for Minable Data Publication
title_sort optimized sanitization approach for minable data publication
topic data publication
data sanitization
association rules hiding
evolutionary algorithm
url https://www.sciopen.com/article/10.26599/BDMA.2022.9020007
work_keys_str_mv AT fanyang anoptimizedsanitizationapproachforminabledatapublication
AT xiaofengliao anoptimizedsanitizationapproachforminabledatapublication
AT fanyang optimizedsanitizationapproachforminabledatapublication
AT xiaofengliao optimizedsanitizationapproachforminabledatapublication