Evaluating GPT models for clinical note de-identification

Abstract The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API acc...

Full description

Saved in:
Bibliographic Details
Main Authors: Bayan Altalla’, Sameera Abdalla, Ahmad Altamimi, Layla Bitar, Amal Al Omari, Ramiz Kardan, Iyad Sultan
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-86890-3
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API access and zero-shot prompt engineering to optimize computational efficiency. Results show that GPT-4 significantly outperformed GPT-3.5, achieving a precision of 0.9925, a recall of 0.8318, an F1 score of 0.8973, and an accuracy of 0.9911. These results demonstrate GPT-4’s potential as a powerful tool for safeguarding patient privacy while increasing the availability of clinical data for research. This work sets a benchmark for balancing data utility and privacy in healthcare data management.
ISSN:2045-2322