Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding
Abstract Background The global outbreak of the coronavirus disease 2019 (COVID-19) has been enormously damaging, in which prolonged shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV) infection is a challenge in the prevention and treatment of COVID-19. How...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2024-11-01
|
| Series: | Journal of Translational Medicine |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12967-024-05872-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850162702519894016 |
|---|---|
| author | Yani Zhang Qiankun Li Haijun Duan Liang Tan Ying Cao Junxin Chen |
| author_facet | Yani Zhang Qiankun Li Haijun Duan Liang Tan Ying Cao Junxin Chen |
| author_sort | Yani Zhang |
| collection | DOAJ |
| description | Abstract Background The global outbreak of the coronavirus disease 2019 (COVID-19) has been enormously damaging, in which prolonged shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV) infection is a challenge in the prevention and treatment of COVID-19. However, there is still incomplete research on the risk factors that affect delayed shedding of SARS-CoV-2. Methods In a retrospective analysis of 56,878 hospitalized patients in the Fangcang Shelter Hospital (National Convention and Exhibition Center) in Shanghai, China, we compared patients with the duration of SARS-CoV-2 viral shedding > 12 days with those days < 12 days. The results of real-time polymerase chain reaction (RT-PCR) tests determined the duration of viral shedding from the first day of SARS-CoV-2 positivity to the day of SARS-CoV-2 negativity. The extreme gradient boosting (XGBoost) machine learning method was employed to establish a prediction model for prolonged SARS-CoV-2 shedding and analyze significant risk factors. Filtering features retraining and Shapley Additive Explanations (SHAP) techniques were followed to demonstrate and further explain the risk factors for long-term SARS-CoV-2 infection. Results We conducted an assessment of ten different features, including vaccination, hypertension, diabetes, admission cycle threshold (Ct) value, cardio-cerebrovascular disease, gender, age, occupation, symptom, and family accompaniment, to determine their impact on the prolonged SARS-CoV-2 shedding. This study involved a large cohort of 56,878 hospitalized patients, and we leveraged the XGBoost algorithm to establish a predictive model based on these features. Upon analysis, six of these ten features were significantly associated with the prolonged SARS-CoV-2 shedding, as determined by both the importance order of the model and our results obtained through model reconstruction. Specifically, vaccination, hypertension, admission Ct value, gender, age, and family accompaniment were identified as the key features associated with prolonged viral shedding. Conclusions We developed a predictive model and identified six risk factors associated with prolonged SARS-CoV-2 viral shedding. Our study contributes to identifying and screening individuals with potential long-term SARS-CoV-2 infections. Moreover, our research also provides a reference for future preventive control, optimizing medical resource allocation and guiding epidemiological prevention, and guidelines for personal protection against SARS-CoV-2. |
| format | Article |
| id | doaj-art-17985ccb7d4f4b24851f8dd06cca5698 |
| institution | OA Journals |
| issn | 1479-5876 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Translational Medicine |
| spelling | doaj-art-17985ccb7d4f4b24851f8dd06cca56982025-08-20T02:22:29ZengBMCJournal of Translational Medicine1479-58762024-11-0122111010.1186/s12967-024-05872-7Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 sheddingYani Zhang0Qiankun Li1Haijun Duan2Liang Tan3Ying Cao4Junxin Chen5Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of SciencesUniversity of Science and Technology of ChinaDepartment of Neurosurgery, Southwest Hospital, Army Medical UniversityCenter of Critical Care Medicine, Southwest Hospital, Army Medical UniversityCenter of Critical Care Medicine, Southwest Hospital, Army Medical UniversitySchool of Software, Dalian University of TechnologyAbstract Background The global outbreak of the coronavirus disease 2019 (COVID-19) has been enormously damaging, in which prolonged shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV) infection is a challenge in the prevention and treatment of COVID-19. However, there is still incomplete research on the risk factors that affect delayed shedding of SARS-CoV-2. Methods In a retrospective analysis of 56,878 hospitalized patients in the Fangcang Shelter Hospital (National Convention and Exhibition Center) in Shanghai, China, we compared patients with the duration of SARS-CoV-2 viral shedding > 12 days with those days < 12 days. The results of real-time polymerase chain reaction (RT-PCR) tests determined the duration of viral shedding from the first day of SARS-CoV-2 positivity to the day of SARS-CoV-2 negativity. The extreme gradient boosting (XGBoost) machine learning method was employed to establish a prediction model for prolonged SARS-CoV-2 shedding and analyze significant risk factors. Filtering features retraining and Shapley Additive Explanations (SHAP) techniques were followed to demonstrate and further explain the risk factors for long-term SARS-CoV-2 infection. Results We conducted an assessment of ten different features, including vaccination, hypertension, diabetes, admission cycle threshold (Ct) value, cardio-cerebrovascular disease, gender, age, occupation, symptom, and family accompaniment, to determine their impact on the prolonged SARS-CoV-2 shedding. This study involved a large cohort of 56,878 hospitalized patients, and we leveraged the XGBoost algorithm to establish a predictive model based on these features. Upon analysis, six of these ten features were significantly associated with the prolonged SARS-CoV-2 shedding, as determined by both the importance order of the model and our results obtained through model reconstruction. Specifically, vaccination, hypertension, admission Ct value, gender, age, and family accompaniment were identified as the key features associated with prolonged viral shedding. Conclusions We developed a predictive model and identified six risk factors associated with prolonged SARS-CoV-2 viral shedding. Our study contributes to identifying and screening individuals with potential long-term SARS-CoV-2 infections. Moreover, our research also provides a reference for future preventive control, optimizing medical resource allocation and guiding epidemiological prevention, and guidelines for personal protection against SARS-CoV-2.https://doi.org/10.1186/s12967-024-05872-7COVID-19SARS-CoV-2Duration of viral sheddingXGBoostMachine learningSHAP interpretability analysis |
| spellingShingle | Yani Zhang Qiankun Li Haijun Duan Liang Tan Ying Cao Junxin Chen Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding Journal of Translational Medicine COVID-19 SARS-CoV-2 Duration of viral shedding XGBoost Machine learning SHAP interpretability analysis |
| title | Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding |
| title_full | Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding |
| title_fullStr | Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding |
| title_full_unstemmed | Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding |
| title_short | Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding |
| title_sort | machine learning based predictive modeling and risk factors for prolonged sars cov 2 shedding |
| topic | COVID-19 SARS-CoV-2 Duration of viral shedding XGBoost Machine learning SHAP interpretability analysis |
| url | https://doi.org/10.1186/s12967-024-05872-7 |
| work_keys_str_mv | AT yanizhang machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding AT qiankunli machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding AT haijunduan machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding AT liangtan machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding AT yingcao machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding AT junxinchen machinelearningbasedpredictivemodelingandriskfactorsforprolongedsarscov2shedding |