Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation

Abstract Generating synthetic data from medical records is a complex task intensified by patient privacy concerns. In recent years, multiple approaches have been reported for the generation of synthetic data, however, limited attention was given to jointly evaluate the quality and the privacy of the...

Full description

Saved in:
Bibliographic Details
Main Authors: Nadir Sella, Florent Guinot, Nikita Lagrange, Laurent-Philippe Albou, Jonathan Desponds, Hervé Isambert
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01431-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585370841120768
author Nadir Sella
Florent Guinot
Nikita Lagrange
Laurent-Philippe Albou
Jonathan Desponds
Hervé Isambert
author_facet Nadir Sella
Florent Guinot
Nikita Lagrange
Laurent-Philippe Albou
Jonathan Desponds
Hervé Isambert
author_sort Nadir Sella
collection DOAJ
description Abstract Generating synthetic data from medical records is a complex task intensified by patient privacy concerns. In recent years, multiple approaches have been reported for the generation of synthetic data, however, limited attention was given to jointly evaluate the quality and the privacy of the generated data. The quality and privacy of synthetic data stem from multivariate associations across variables, which cannot be assessed by comparing univariate distributions with the original data. Here, we introduce a novel algorithm (MIIC-SDG) for generating synthetic data from electronic records based on a multivariate information framework and Bayesian network theory. We also propose a new metric to quantitatively assess the trade-off between the Quality and Privacy Scores (QPS) of synthetic data generation methods. The performance of MIIC-SDG is demonstrated on different clinical datasets and favorably compares with state-of-the-art synthetic data generation methods, based on the QPS trade-off between several quality and privacy metrics.
format Article
id doaj-art-97ff12dccb6b4587b7a5946516dc5dc5
institution Kabale University
issn 2398-6352
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-97ff12dccb6b4587b7a5946516dc5dc52025-01-26T12:53:51ZengNature Portfolionpj Digital Medicine2398-63522025-01-018111610.1038/s41746-025-01431-6Preserving information while respecting privacy through an information theoretic framework for synthetic health data generationNadir Sella0Florent Guinot1Nikita Lagrange2Laurent-Philippe Albou3Jonathan Desponds4Hervé Isambert5Institut RocheInstitut RocheInstitut Curie, CNRS UMR168, PSL University, Sorbonne UniversityInstitut RocheInstitut RocheInstitut Curie, CNRS UMR168, PSL University, Sorbonne UniversityAbstract Generating synthetic data from medical records is a complex task intensified by patient privacy concerns. In recent years, multiple approaches have been reported for the generation of synthetic data, however, limited attention was given to jointly evaluate the quality and the privacy of the generated data. The quality and privacy of synthetic data stem from multivariate associations across variables, which cannot be assessed by comparing univariate distributions with the original data. Here, we introduce a novel algorithm (MIIC-SDG) for generating synthetic data from electronic records based on a multivariate information framework and Bayesian network theory. We also propose a new metric to quantitatively assess the trade-off between the Quality and Privacy Scores (QPS) of synthetic data generation methods. The performance of MIIC-SDG is demonstrated on different clinical datasets and favorably compares with state-of-the-art synthetic data generation methods, based on the QPS trade-off between several quality and privacy metrics.https://doi.org/10.1038/s41746-025-01431-6
spellingShingle Nadir Sella
Florent Guinot
Nikita Lagrange
Laurent-Philippe Albou
Jonathan Desponds
Hervé Isambert
Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation
npj Digital Medicine
title Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation
title_full Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation
title_fullStr Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation
title_full_unstemmed Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation
title_short Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation
title_sort preserving information while respecting privacy through an information theoretic framework for synthetic health data generation
url https://doi.org/10.1038/s41746-025-01431-6
work_keys_str_mv AT nadirsella preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration
AT florentguinot preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration
AT nikitalagrange preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration
AT laurentphilippealbou preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration
AT jonathandesponds preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration
AT herveisambert preservinginformationwhilerespectingprivacythroughaninformationtheoreticframeworkforsynthetichealthdatageneration