Comparing conventional and Bayesian workflows for clinical outcome prediction modelling with an exemplar cohort study of severe COVID-19 infection incorporating clinical biomarker test results

Abstract Purpose Assessing risk factors and creating prediction models from real-world medical data is challenging, requiring numerous modelling decisions with clinical guidance. Logistic regression is a common model for such studies, for which we advocate the use of Bayesian methods that can jointl...

Full description

Saved in:
Bibliographic Details
Main Authors: Brian Sullivan, Edward Barker, Louis MacGregor, Leo Gorman, Philip Williams, Ranjeet Bhamber, Matt Thomas, Stefan Gurney, Catherine Hyams, Alastair Whiteway, Jennifer A. Cooper, Chris McWilliams, Katy Turner, Andrew W. Dowsey, Mahableshwar Albur
Format: Article
Language:English
Published: BMC 2025-03-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-02955-3
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Purpose Assessing risk factors and creating prediction models from real-world medical data is challenging, requiring numerous modelling decisions with clinical guidance. Logistic regression is a common model for such studies, for which we advocate the use of Bayesian methods that can jointly deliver probabilistic risk factor inference and prediction. As an exemplar, we compare Bayesian logistic regression with horseshoe priors and Projective Prediction variable selection with the established frequentist LASSO approach, to predict severe COVID-19 outcomes (death or ICU admittance) from demographic and laboratory biomarker data. Our study serves as guidance on data curation, variable selection, and performance assessment with cross-validation. Methods Our source data is based on a retrospective observational cohort design with records from three National Health Service (NHS) Trusts in southwest England, UK. Models were fit to predict severe outcomes within 28 days after admission to hospital (or a positive PCR result if already admitted) using demographic data and the first result from 30 biomarker tests collected within 3 days after admission (or testing positive if already admitted). Results Patients included hospitalized adults positive for COVID-19 from March to October 2020, 756 total patients: Mean age 71, 45% female, 31% (n=234) had a severe outcome, of whom 88% (n=206) died. Patients were split into training (n=534) and external validation groups (n=222). Using our Bayesian pipeline, we show a reduced variable model using Age, Urea, Prothrombin time (PT) C-reactive protein (CRP), and Neutrophil-Lymphocyte ratio (NLR) has better predictive performance (median external AUC: 0.71, 95% Quantile [0.7, 0.72]) relative to a GLM using all variables (external AUC: 0.67 [0.63, 0.71]). Conclusion Urea, PT, CRP, and NLR have been highlighted by other studies, and respectively suggest that hypovolemia, derangement of circulation via clotting, and inflammation are strong predictive risk factors of severity. This study provides guidance on conventional and Bayesian regression and prediction modelling with complex clinical data.
ISSN:1472-6947