Predictive modelling and identification of critical variables of mortality risk in COVID-19 patients
Abstract South Africa was the most affected country in Africa by the coronavirus disease 2019 (COVID-19) pandemic, where over 4 million confirmed cases of COVID-19 and over 102,000 deaths have been recorded since 2019. Aside from clinical methods, artificial intelligence (AI)-based solutions such as...
Saved in:
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-46712-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832594824531804160 |
---|---|
author | Olawande Daramola Tatenda Duncan Kavu Maritha J. Kotze Jeanine L. Marnewick Oluwafemi A. Sarumi Boniface Kabaso Thomas Moser Karl Stroetmann Isaac Fwemba Fisayo Daramola Martha Nyirenda Susan J. van Rensburg Peter S. Nyasulu |
author_facet | Olawande Daramola Tatenda Duncan Kavu Maritha J. Kotze Jeanine L. Marnewick Oluwafemi A. Sarumi Boniface Kabaso Thomas Moser Karl Stroetmann Isaac Fwemba Fisayo Daramola Martha Nyirenda Susan J. van Rensburg Peter S. Nyasulu |
author_sort | Olawande Daramola |
collection | DOAJ |
description | Abstract South Africa was the most affected country in Africa by the coronavirus disease 2019 (COVID-19) pandemic, where over 4 million confirmed cases of COVID-19 and over 102,000 deaths have been recorded since 2019. Aside from clinical methods, artificial intelligence (AI)-based solutions such as machine learning (ML) models have been employed in treating COVID-19 cases. However, limited application of AI for COVID-19 in Africa has been reported in the literature. This study aimed to investigate the performance and interpretability of several ML algorithms, including deep multilayer perceptron (Deep MLP), support vector machine (SVM) and Extreme gradient boosting trees (XGBoost) for predicting COVID-19 mortality risk with an emphasis on the effect of cross-validation (CV) and principal component analysis (PCA) on the results. For this purpose, a dataset with 154 features from 490 COVID-19 patients admitted into the intensive care unit (ICU) of Tygerberg Hospital in Cape Town, South Africa, during the first wave of COVID-19 in 2020 was retrospectively analysed. Our results show that Deep MLP had the best overall performance (F1 = 0.92; area under the curve (AUC) = 0.94) when CV and the synthetic minority oversampling technique (SMOTE) were applied without PCA. By using the Shapley Additive exPlanations (SHAP) model to interpret the mortality risk predictions, we identified the Length of stay (LOS) in the hospital, LOS in the ICU, Time to ICU from admission, days discharged alive or death, D-dimer (blood clotting factor), and blood pH as the six most critical variables for mortality risk prediction. Also, Age at admission, Pf ratio (PaO2/FiO2 ratio), troponin T (TropT), ferritin, ventilation, C-reactive protein (CRP), and symptoms of acute respiratory distress syndrome (ARDS) were associated with the severity and fatality of COVID-19 cases. The study reveals how ML could assist medical practitioners in making informed decisions on handling critically ill COVID-19 patients with comorbidities. It also offers insight into the combined effect of CV, PCA, and SMOTE on the performance of ML models for COVID-19 mortality risk prediction, which has been little explored. |
format | Article |
id | doaj-art-491af48889af4249b3345bdc0698c4a7 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-491af48889af4249b3345bdc0698c4a72025-01-19T12:18:29ZengNature PortfolioScientific Reports2045-23222025-01-0115112010.1038/s41598-023-46712-wPredictive modelling and identification of critical variables of mortality risk in COVID-19 patientsOlawande Daramola0Tatenda Duncan Kavu1Maritha J. Kotze2Jeanine L. Marnewick3Oluwafemi A. Sarumi4Boniface Kabaso5Thomas Moser6Karl Stroetmann7Isaac Fwemba8Fisayo Daramola9Martha Nyirenda10Susan J. van Rensburg11Peter S. Nyasulu12Department of Information Technology, Cape Peninsula University of TechnologyDepartment of Information Technology, Cape Peninsula University of TechnologyDivision of Chemical Pathology, Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch UniversityApplied Microbial and Health Biotechnology Institute, Cape Peninsula University of TechnologyDepartment of Mathematics and Computer Science, Philipps University of MarburgDepartment of Information Technology, Cape Peninsula University of TechnologySt Pölten University of Applied SciencesSchool of Health Information Science, University of VictoriaDivision of Epidemiology and Biostatistics, Faculty of Medicine, and Health Sciences, Stellenbosch UniversityDivision of Epidemiology and Biostatistics, Faculty of Medicine, and Health Sciences, Stellenbosch UniversityDivision of Epidemiology and Biostatistics, Faculty of Medicine, and Health Sciences, Stellenbosch UniversityDivision of Chemical Pathology, Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch UniversityDivision of Epidemiology and Biostatistics, Faculty of Medicine, and Health Sciences, Stellenbosch UniversityAbstract South Africa was the most affected country in Africa by the coronavirus disease 2019 (COVID-19) pandemic, where over 4 million confirmed cases of COVID-19 and over 102,000 deaths have been recorded since 2019. Aside from clinical methods, artificial intelligence (AI)-based solutions such as machine learning (ML) models have been employed in treating COVID-19 cases. However, limited application of AI for COVID-19 in Africa has been reported in the literature. This study aimed to investigate the performance and interpretability of several ML algorithms, including deep multilayer perceptron (Deep MLP), support vector machine (SVM) and Extreme gradient boosting trees (XGBoost) for predicting COVID-19 mortality risk with an emphasis on the effect of cross-validation (CV) and principal component analysis (PCA) on the results. For this purpose, a dataset with 154 features from 490 COVID-19 patients admitted into the intensive care unit (ICU) of Tygerberg Hospital in Cape Town, South Africa, during the first wave of COVID-19 in 2020 was retrospectively analysed. Our results show that Deep MLP had the best overall performance (F1 = 0.92; area under the curve (AUC) = 0.94) when CV and the synthetic minority oversampling technique (SMOTE) were applied without PCA. By using the Shapley Additive exPlanations (SHAP) model to interpret the mortality risk predictions, we identified the Length of stay (LOS) in the hospital, LOS in the ICU, Time to ICU from admission, days discharged alive or death, D-dimer (blood clotting factor), and blood pH as the six most critical variables for mortality risk prediction. Also, Age at admission, Pf ratio (PaO2/FiO2 ratio), troponin T (TropT), ferritin, ventilation, C-reactive protein (CRP), and symptoms of acute respiratory distress syndrome (ARDS) were associated with the severity and fatality of COVID-19 cases. The study reveals how ML could assist medical practitioners in making informed decisions on handling critically ill COVID-19 patients with comorbidities. It also offers insight into the combined effect of CV, PCA, and SMOTE on the performance of ML models for COVID-19 mortality risk prediction, which has been little explored.https://doi.org/10.1038/s41598-023-46712-w |
spellingShingle | Olawande Daramola Tatenda Duncan Kavu Maritha J. Kotze Jeanine L. Marnewick Oluwafemi A. Sarumi Boniface Kabaso Thomas Moser Karl Stroetmann Isaac Fwemba Fisayo Daramola Martha Nyirenda Susan J. van Rensburg Peter S. Nyasulu Predictive modelling and identification of critical variables of mortality risk in COVID-19 patients Scientific Reports |
title | Predictive modelling and identification of critical variables of mortality risk in COVID-19 patients |
title_full | Predictive modelling and identification of critical variables of mortality risk in COVID-19 patients |
title_fullStr | Predictive modelling and identification of critical variables of mortality risk in COVID-19 patients |
title_full_unstemmed | Predictive modelling and identification of critical variables of mortality risk in COVID-19 patients |
title_short | Predictive modelling and identification of critical variables of mortality risk in COVID-19 patients |
title_sort | predictive modelling and identification of critical variables of mortality risk in covid 19 patients |
url | https://doi.org/10.1038/s41598-023-46712-w |
work_keys_str_mv | AT olawandedaramola predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT tatendaduncankavu predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT marithajkotze predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT jeaninelmarnewick predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT oluwafemiasarumi predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT bonifacekabaso predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT thomasmoser predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT karlstroetmann predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT isaacfwemba predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT fisayodaramola predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT marthanyirenda predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT susanjvanrensburg predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients AT petersnyasulu predictivemodellingandidentificationofcriticalvariablesofmortalityriskincovid19patients |