Evaluating GPT models for clinical note de-identification

Abstract The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API acc...

Full description

Saved in:
Bibliographic Details
Main Authors: Bayan Altalla’, Sameera Abdalla, Ahmad Altamimi, Layla Bitar, Amal Al Omari, Ramiz Kardan, Iyad Sultan
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-86890-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571750679838720
author Bayan Altalla’
Sameera Abdalla
Ahmad Altamimi
Layla Bitar
Amal Al Omari
Ramiz Kardan
Iyad Sultan
author_facet Bayan Altalla’
Sameera Abdalla
Ahmad Altamimi
Layla Bitar
Amal Al Omari
Ramiz Kardan
Iyad Sultan
author_sort Bayan Altalla’
collection DOAJ
description Abstract The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API access and zero-shot prompt engineering to optimize computational efficiency. Results show that GPT-4 significantly outperformed GPT-3.5, achieving a precision of 0.9925, a recall of 0.8318, an F1 score of 0.8973, and an accuracy of 0.9911. These results demonstrate GPT-4’s potential as a powerful tool for safeguarding patient privacy while increasing the availability of clinical data for research. This work sets a benchmark for balancing data utility and privacy in healthcare data management.
format Article
id doaj-art-b98820317750445fb2f9952398c066b8
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-b98820317750445fb2f9952398c066b82025-02-02T12:22:18ZengNature PortfolioScientific Reports2045-23222025-01-0115111210.1038/s41598-025-86890-3Evaluating GPT models for clinical note de-identificationBayan Altalla’0Sameera Abdalla1Ahmad Altamimi2Layla Bitar3Amal Al Omari4Ramiz Kardan5Iyad Sultan6King Hussein Cancer CenterPrincess Sumaya University for TechnologyPrincess Sumaya University for TechnologyKing Hussein Cancer CenterKing Hussein Cancer CenterUniversity of JordanKing Hussein Cancer CenterAbstract The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API access and zero-shot prompt engineering to optimize computational efficiency. Results show that GPT-4 significantly outperformed GPT-3.5, achieving a precision of 0.9925, a recall of 0.8318, an F1 score of 0.8973, and an accuracy of 0.9911. These results demonstrate GPT-4’s potential as a powerful tool for safeguarding patient privacy while increasing the availability of clinical data for research. This work sets a benchmark for balancing data utility and privacy in healthcare data management.https://doi.org/10.1038/s41598-025-86890-3
spellingShingle Bayan Altalla’
Sameera Abdalla
Ahmad Altamimi
Layla Bitar
Amal Al Omari
Ramiz Kardan
Iyad Sultan
Evaluating GPT models for clinical note de-identification
Scientific Reports
title Evaluating GPT models for clinical note de-identification
title_full Evaluating GPT models for clinical note de-identification
title_fullStr Evaluating GPT models for clinical note de-identification
title_full_unstemmed Evaluating GPT models for clinical note de-identification
title_short Evaluating GPT models for clinical note de-identification
title_sort evaluating gpt models for clinical note de identification
url https://doi.org/10.1038/s41598-025-86890-3
work_keys_str_mv AT bayanaltalla evaluatinggptmodelsforclinicalnotedeidentification
AT sameeraabdalla evaluatinggptmodelsforclinicalnotedeidentification
AT ahmadaltamimi evaluatinggptmodelsforclinicalnotedeidentification
AT laylabitar evaluatinggptmodelsforclinicalnotedeidentification
AT amalalomari evaluatinggptmodelsforclinicalnotedeidentification
AT ramizkardan evaluatinggptmodelsforclinicalnotedeidentification
AT iyadsultan evaluatinggptmodelsforclinicalnotedeidentification