Evaluating GPT models for clinical note de-identification
Abstract The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API acc...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-025-86890-3 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571750679838720 |
---|---|
author | Bayan Altalla’ Sameera Abdalla Ahmad Altamimi Layla Bitar Amal Al Omari Ramiz Kardan Iyad Sultan |
author_facet | Bayan Altalla’ Sameera Abdalla Ahmad Altamimi Layla Bitar Amal Al Omari Ramiz Kardan Iyad Sultan |
author_sort | Bayan Altalla’ |
collection | DOAJ |
description | Abstract The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API access and zero-shot prompt engineering to optimize computational efficiency. Results show that GPT-4 significantly outperformed GPT-3.5, achieving a precision of 0.9925, a recall of 0.8318, an F1 score of 0.8973, and an accuracy of 0.9911. These results demonstrate GPT-4’s potential as a powerful tool for safeguarding patient privacy while increasing the availability of clinical data for research. This work sets a benchmark for balancing data utility and privacy in healthcare data management. |
format | Article |
id | doaj-art-b98820317750445fb2f9952398c066b8 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-b98820317750445fb2f9952398c066b82025-02-02T12:22:18ZengNature PortfolioScientific Reports2045-23222025-01-0115111210.1038/s41598-025-86890-3Evaluating GPT models for clinical note de-identificationBayan Altalla’0Sameera Abdalla1Ahmad Altamimi2Layla Bitar3Amal Al Omari4Ramiz Kardan5Iyad Sultan6King Hussein Cancer CenterPrincess Sumaya University for TechnologyPrincess Sumaya University for TechnologyKing Hussein Cancer CenterKing Hussein Cancer CenterUniversity of JordanKing Hussein Cancer CenterAbstract The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API access and zero-shot prompt engineering to optimize computational efficiency. Results show that GPT-4 significantly outperformed GPT-3.5, achieving a precision of 0.9925, a recall of 0.8318, an F1 score of 0.8973, and an accuracy of 0.9911. These results demonstrate GPT-4’s potential as a powerful tool for safeguarding patient privacy while increasing the availability of clinical data for research. This work sets a benchmark for balancing data utility and privacy in healthcare data management.https://doi.org/10.1038/s41598-025-86890-3 |
spellingShingle | Bayan Altalla’ Sameera Abdalla Ahmad Altamimi Layla Bitar Amal Al Omari Ramiz Kardan Iyad Sultan Evaluating GPT models for clinical note de-identification Scientific Reports |
title | Evaluating GPT models for clinical note de-identification |
title_full | Evaluating GPT models for clinical note de-identification |
title_fullStr | Evaluating GPT models for clinical note de-identification |
title_full_unstemmed | Evaluating GPT models for clinical note de-identification |
title_short | Evaluating GPT models for clinical note de-identification |
title_sort | evaluating gpt models for clinical note de identification |
url | https://doi.org/10.1038/s41598-025-86890-3 |
work_keys_str_mv | AT bayanaltalla evaluatinggptmodelsforclinicalnotedeidentification AT sameeraabdalla evaluatinggptmodelsforclinicalnotedeidentification AT ahmadaltamimi evaluatinggptmodelsforclinicalnotedeidentification AT laylabitar evaluatinggptmodelsforclinicalnotedeidentification AT amalalomari evaluatinggptmodelsforclinicalnotedeidentification AT ramizkardan evaluatinggptmodelsforclinicalnotedeidentification AT iyadsultan evaluatinggptmodelsforclinicalnotedeidentification |