Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data

BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data acces...

Full description

Saved in:

Bibliographic Details
Main Authors:	Austin A. Barr, Joshua Quan, Eddie Guo, Emre Sezgin
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-02-01
Series:	Frontiers in Artificial Intelligence
Subjects:	synthetic data large language model artificial intelligence machine learning ChatGPT big data
Online Access:	https://www.frontiersin.org/articles/10.3389/frai.2025.1533508/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832096570165690368
author	Austin A. Barr Joshua Quan Eddie Guo Emre Sezgin Emre Sezgin
author_facet	Austin A. Barr Joshua Quan Eddie Guo Emre Sezgin Emre Sezgin
author_sort	Austin A. Barr
collection	DOAJ
description	BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
format	Article
id	doaj-art-22963059c3264167bae61b7d55b22bee
institution	Kabale University
issn	2624-8212
language	English
publishDate	2025-02-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Artificial Intelligence
spelling	doaj-art-22963059c3264167bae61b7d55b22bee2025-02-05T13:19:11ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122025-02-01810.3389/frai.2025.15335081533508Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative dataAustin A. Barr0Joshua Quan1Eddie Guo2Emre Sezgin3Emre Sezgin4Cumming School of Medicine, University of Calgary, Calgary, AB, CanadaCumming School of Medicine, University of Calgary, Calgary, AB, CanadaCumming School of Medicine, University of Calgary, Calgary, AB, CanadaThe Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United StatesDepartment of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, United StatesBackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.https://www.frontiersin.org/articles/10.3389/frai.2025.1533508/fullsynthetic datalarge language modelartificial intelligencemachine learningChatGPTbig data
spellingShingle	Austin A. Barr Joshua Quan Eddie Guo Emre Sezgin Emre Sezgin Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data Frontiers in Artificial Intelligence synthetic data large language model artificial intelligence machine learning ChatGPT big data
title	Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data
title_full	Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data
title_fullStr	Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data
title_full_unstemmed	Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data
title_short	Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data
title_sort	large language models generating synthetic clinical datasets a feasibility and comparative analysis with real world perioperative data
topic	synthetic data large language model artificial intelligence machine learning ChatGPT big data
url	https://www.frontiersin.org/articles/10.3389/frai.2025.1533508/full
work_keys_str_mv	AT austinabarr largelanguagemodelsgeneratingsyntheticclinicaldatasetsafeasibilityandcomparativeanalysiswithrealworldperioperativedata AT joshuaquan largelanguagemodelsgeneratingsyntheticclinicaldatasetsafeasibilityandcomparativeanalysiswithrealworldperioperativedata AT eddieguo largelanguagemodelsgeneratingsyntheticclinicaldatasetsafeasibilityandcomparativeanalysiswithrealworldperioperativedata AT emresezgin largelanguagemodelsgeneratingsyntheticclinicaldatasetsafeasibilityandcomparativeanalysiswithrealworldperioperativedata AT emresezgin largelanguagemodelsgeneratingsyntheticclinicaldatasetsafeasibilityandcomparativeanalysiswithrealworldperioperativedata

Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data

Similar Items