Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding

Abstract Background The global outbreak of the coronavirus disease 2019 (COVID-19) has been enormously damaging, in which prolonged shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV) infection is a challenge in the prevention and treatment of COVID-19. How...

Full description

Saved in:
Bibliographic Details
Main Authors: Yani Zhang, Qiankun Li, Haijun Duan, Liang Tan, Ying Cao, Junxin Chen
Format: Article
Language:English
Published: BMC 2024-11-01
Series:Journal of Translational Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12967-024-05872-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850162702519894016
author Yani Zhang
Qiankun Li
Haijun Duan
Liang Tan
Ying Cao
Junxin Chen
author_facet Yani Zhang
Qiankun Li
Haijun Duan
Liang Tan
Ying Cao
Junxin Chen
author_sort Yani Zhang
collection DOAJ
description Abstract Background The global outbreak of the coronavirus disease 2019 (COVID-19) has been enormously damaging, in which prolonged shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV) infection is a challenge in the prevention and treatment of COVID-19. However, there is still incomplete research on the risk factors that affect delayed shedding of SARS-CoV-2. Methods In a retrospective analysis of 56,878 hospitalized patients in the Fangcang Shelter Hospital (National Convention and Exhibition Center) in Shanghai, China, we compared patients with the duration of SARS-CoV-2 viral shedding > 12 days with those days < 12 days. The results of real-time polymerase chain reaction (RT-PCR) tests determined the duration of viral shedding from the first day of SARS-CoV-2 positivity to the day of SARS-CoV-2 negativity. The extreme gradient boosting (XGBoost) machine learning method was employed to establish a prediction model for prolonged SARS-CoV-2 shedding and analyze significant risk factors. Filtering features retraining and Shapley Additive Explanations (SHAP) techniques were followed to demonstrate and further explain the risk factors for long-term SARS-CoV-2 infection. Results We conducted an assessment of ten different features, including vaccination, hypertension, diabetes, admission cycle threshold (Ct) value, cardio-cerebrovascular disease, gender, age, occupation, symptom, and family accompaniment, to determine their impact on the prolonged SARS-CoV-2 shedding. This study involved a large cohort of 56,878 hospitalized patients, and we leveraged the XGBoost algorithm to establish a predictive model based on these features. Upon analysis, six of these ten features were significantly associated with the prolonged SARS-CoV-2 shedding, as determined by both the importance order of the model and our results obtained through model reconstruction. Specifically, vaccination, hypertension, admission Ct value, gender, age, and family accompaniment were identified as the key features associated with prolonged viral shedding. Conclusions We developed a predictive model and identified six risk factors associated with prolonged SARS-CoV-2 viral shedding. Our study contributes to identifying and screening individuals with potential long-term SARS-CoV-2 infections. Moreover, our research also provides a reference for future preventive control, optimizing medical resource allocation and guiding epidemiological prevention, and guidelines for personal protection against SARS-CoV-2.
format Article
id doaj-art-17985ccb7d4f4b24851f8dd06cca5698
institution OA Journals
issn 1479-5876
language English
publishDate 2024-11-01
publisher BMC
record_format Article
series Journal of Translational Medicine
spelling doaj-art-17985ccb7d4f4b24851f8dd06cca56982025-08-20T02:22:29ZengBMCJournal of Translational Medicine1479-58762024-11-0122111010.1186/s12967-024-05872-7Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 sheddingYani Zhang0Qiankun Li1Haijun Duan2Liang Tan3Ying Cao4Junxin Chen5Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of SciencesUniversity of Science and Technology of ChinaDepartment of Neurosurgery, Southwest Hospital, Army Medical UniversityCenter of Critical Care Medicine, Southwest Hospital, Army Medical UniversityCenter of Critical Care Medicine, Southwest Hospital, Army Medical UniversitySchool of Software, Dalian University of TechnologyAbstract Background The global outbreak of the coronavirus disease 2019 (COVID-19) has been enormously damaging, in which prolonged shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV) infection is a challenge in the prevention and treatment of COVID-19. However, there is still incomplete research on the risk factors that affect delayed shedding of SARS-CoV-2. Methods In a retrospective analysis of 56,878 hospitalized patients in the Fangcang Shelter Hospital (National Convention and Exhibition Center) in Shanghai, China, we compared patients with the duration of SARS-CoV-2 viral shedding > 12 days with those days < 12 days. The results of real-time polymerase chain reaction (RT-PCR) tests determined the duration of viral shedding from the first day of SARS-CoV-2 positivity to the day of SARS-CoV-2 negativity. The extreme gradient boosting (XGBoost) machine learning method was employed to establish a prediction model for prolonged SARS-CoV-2 shedding and analyze significant risk factors. Filtering features retraining and Shapley Additive Explanations (SHAP) techniques were followed to demonstrate and further explain the risk factors for long-term SARS-CoV-2 infection. Results We conducted an assessment of ten different features, including vaccination, hypertension, diabetes, admission cycle threshold (Ct) value, cardio-cerebrovascular disease, gender, age, occupation, symptom, and family accompaniment, to determine their impact on the prolonged SARS-CoV-2 shedding. This study involved a large cohort of 56,878 hospitalized patients, and we leveraged the XGBoost algorithm to establish a predictive model based on these features. Upon analysis, six of these ten features were significantly associated with the prolonged SARS-CoV-2 shedding, as determined by both the importance order of the model and our results obtained through model reconstruction. Specifically, vaccination, hypertension, admission Ct value, gender, age, and family accompaniment were identified as the key features associated with prolonged viral shedding. Conclusions We developed a predictive model and identified six risk factors associated with prolonged SARS-CoV-2 viral shedding. Our study contributes to identifying and screening individuals with potential long-term SARS-CoV-2 infections. Moreover, our research also provides a reference for future preventive control, optimizing medical resource allocation and guiding epidemiological prevention, and guidelines for personal protection against SARS-CoV-2.https://doi.org/10.1186/s12967-024-05872-7COVID-19SARS-CoV-2Duration of viral sheddingXGBoostMachine learningSHAP interpretability analysis
spellingShingle Yani Zhang
Qiankun Li
Haijun Duan
Liang Tan
Ying Cao
Junxin Chen
Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding
Journal of Translational Medicine
COVID-19
SARS-CoV-2
Duration of viral shedding
XGBoost
Machine learning
SHAP interpretability analysis
title Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding
title_full Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding
title_fullStr Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding
title_full_unstemmed Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding
title_short Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding
title_sort machine learning based predictive modeling and risk factors for prolonged sars cov 2 shedding
topic COVID-19
SARS-CoV-2
Duration of viral shedding
XGBoost
Machine learning
SHAP interpretability analysis
url https://doi.org/10.1186/s12967-024-05872-7
work_keys_str_mv AT yanizhang machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding
AT qiankunli machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding
AT haijunduan machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding
AT liangtan machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding
AT yingcao machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding
AT junxinchen machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding