GPT-based prediction of short-term survival following decompressive hemicraniectomy in malignant middle cerebral artery infarction

IntroductionAn analysis of the prognostic ability of the large language model (LLM) Generative Pre-trained Transformer (GPT) to predict short-term survival and functional outcomes in patients with malignant middle cerebral artery (MCA) infarction following decompressive hemicraniectomy.MethodsThis r...

Full description

Saved in:
Bibliographic Details
Main Authors: Sebastian Lehmann, Martin Vychopen, Erdem Güresir, Johannes Wach
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-07-01
Series:Frontiers in Neurology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fneur.2025.1603536/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:IntroductionAn analysis of the prognostic ability of the large language model (LLM) Generative Pre-trained Transformer (GPT) to predict short-term survival and functional outcomes in patients with malignant middle cerebral artery (MCA) infarction following decompressive hemicraniectomy.MethodsThis retrospective study included 100 patients with malignant MCA infarction who underwent decompressive craniectomy (DC). GPT-4 and GPT-4 Omni were used to predict patient outcomes based on 20 patient-specific factors. Each version of GPT was tested with and without context enrichment (CE). CE versions were provided with the current AHA/ASA 2019 guidelines and meta-analyses of RCTs to inform decision-making. The real-life outcome of the patients, measured by the modified Rankin Scale (mRS), served as a reference. The following endpoints were evaluated: survival during inpatient stay, achievement of a functional status of mRS 0–4 at discharge, and at 3-, 6-, and 12-months post-discharge. We analyzed the prognostic prediction of GPT by calculating the area under the curve (AUC) and determining the optimal cutoff using the Youden index for divergent prediction outcomes. After dichotomization according to the cutoff set, a chi-squared test (two-sided) was performed.ResultsGPT-4 and GPT-4 Omni demonstrated the ability to estimate survival during in-hospital stay. In both versions, the CE GPT outperformed the non-CE versions. GPT-4 Omni (CE) achieved an AUC of 0.67 (95% CI: 0.54–0.79; p = 0.002), while GPT-4 (CE) reached an AUC of 0.70 (95% CI: 0.57–0.82; p = 0.018). GPT-4 also achieved statistical significance even without CE (AUC of 0.66; 95% CI: 0.53–0.78; p = 0.018). In contrast, the non-CE version of GPT-4 Omni did not reach significance in predicting the survival of hospitalization (AUC of 0.60; 95% CI: 0.48–0.73; p = 0.07). For questions regarding the functional outcome of patients, neither version of GPT was able to make a sufficient prognostic prediction. However, when provided with the pre-stroke mRS, GPT-4 Omni was able to predict the mRS at discharge (p = 0.01; Pearson's correlation coefficient = 0.696).ConclusionThe study shows the already existing high potential of AI in predicting short-term outcomes. It also shows the existing limitations for the evaluation of more complex questions, such as functional outcomes.
ISSN:1664-2295