Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation
Abstract Generating synthetic data from medical records is a complex task intensified by patient privacy concerns. In recent years, multiple approaches have been reported for the generation of synthetic data, however, limited attention was given to jointly evaluate the quality and the privacy of the...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | npj Digital Medicine |
Online Access: | https://doi.org/10.1038/s41746-025-01431-6 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585370841120768 |
---|---|
author | Nadir Sella Florent Guinot Nikita Lagrange Laurent-Philippe Albou Jonathan Desponds Hervé Isambert |
author_facet | Nadir Sella Florent Guinot Nikita Lagrange Laurent-Philippe Albou Jonathan Desponds Hervé Isambert |
author_sort | Nadir Sella |
collection | DOAJ |
description | Abstract Generating synthetic data from medical records is a complex task intensified by patient privacy concerns. In recent years, multiple approaches have been reported for the generation of synthetic data, however, limited attention was given to jointly evaluate the quality and the privacy of the generated data. The quality and privacy of synthetic data stem from multivariate associations across variables, which cannot be assessed by comparing univariate distributions with the original data. Here, we introduce a novel algorithm (MIIC-SDG) for generating synthetic data from electronic records based on a multivariate information framework and Bayesian network theory. We also propose a new metric to quantitatively assess the trade-off between the Quality and Privacy Scores (QPS) of synthetic data generation methods. The performance of MIIC-SDG is demonstrated on different clinical datasets and favorably compares with state-of-the-art synthetic data generation methods, based on the QPS trade-off between several quality and privacy metrics. |
format | Article |
id | doaj-art-97ff12dccb6b4587b7a5946516dc5dc5 |
institution | Kabale University |
issn | 2398-6352 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | npj Digital Medicine |
spelling | doaj-art-97ff12dccb6b4587b7a5946516dc5dc52025-01-26T12:53:51ZengNature Portfolionpj Digital Medicine2398-63522025-01-018111610.1038/s41746-025-01431-6Preserving information while respecting privacy through an information theoretic framework for synthetic health data generationNadir Sella0Florent Guinot1Nikita Lagrange2Laurent-Philippe Albou3Jonathan Desponds4Hervé Isambert5Institut RocheInstitut RocheInstitut Curie, CNRS UMR168, PSL University, Sorbonne UniversityInstitut RocheInstitut RocheInstitut Curie, CNRS UMR168, PSL University, Sorbonne UniversityAbstract Generating synthetic data from medical records is a complex task intensified by patient privacy concerns. In recent years, multiple approaches have been reported for the generation of synthetic data, however, limited attention was given to jointly evaluate the quality and the privacy of the generated data. The quality and privacy of synthetic data stem from multivariate associations across variables, which cannot be assessed by comparing univariate distributions with the original data. Here, we introduce a novel algorithm (MIIC-SDG) for generating synthetic data from electronic records based on a multivariate information framework and Bayesian network theory. We also propose a new metric to quantitatively assess the trade-off between the Quality and Privacy Scores (QPS) of synthetic data generation methods. The performance of MIIC-SDG is demonstrated on different clinical datasets and favorably compares with state-of-the-art synthetic data generation methods, based on the QPS trade-off between several quality and privacy metrics.https://doi.org/10.1038/s41746-025-01431-6 |
spellingShingle | Nadir Sella Florent Guinot Nikita Lagrange Laurent-Philippe Albou Jonathan Desponds Hervé Isambert Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation npj Digital Medicine |
title | Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation |
title_full | Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation |
title_fullStr | Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation |
title_full_unstemmed | Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation |
title_short | Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation |
title_sort | preserving information while respecting privacy through an information theoretic framework for synthetic health data generation |
url | https://doi.org/10.1038/s41746-025-01431-6 |
work_keys_str_mv | AT nadirsella preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration AT florentguinot preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration AT nikitalagrange preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration AT laurentphilippealbou preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration AT jonathandesponds preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration AT herveisambert preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration |