Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data
Abstract Colorectal cancer (CRC) is a prevalent malignant tumor that presents significant challenges to both public health and healthcare systems. The aim of this study was to develop a machine learning model based on five years of clinical follow-up data from CRC patients to accurately identify ind...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-025-86872-5 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585862392578048 |
---|---|
author | Boao Xiao Min Yang Yao Meng Weimin Wang Yuan Chen Chenglong Yu Longlong Bai Lishun Xiao Yansu Chen |
author_facet | Boao Xiao Min Yang Yao Meng Weimin Wang Yuan Chen Chenglong Yu Longlong Bai Lishun Xiao Yansu Chen |
author_sort | Boao Xiao |
collection | DOAJ |
description | Abstract Colorectal cancer (CRC) is a prevalent malignant tumor that presents significant challenges to both public health and healthcare systems. The aim of this study was to develop a machine learning model based on five years of clinical follow-up data from CRC patients to accurately identify individuals at risk of poor prognosis. This study included 411 CRC patients who underwent surgery at Yixing Hospital and completed the follow-up process. A modeling dataset containing 73 characteristic variables was established by collecting demographic information, clinical blood test indicators, histopathological results, and additional treatment-related information. Decision tree, random forest, support vector machine, and extreme gradient boosting (XGBoost) models were selected for modeling based on the features identified through recursive feature elimination (RFE). The Cox proportional hazards model was used as the baseline for model comparison. During the model training process, hyperparameters were optimized using a grid search method. The model performance was comprehensively assessed using multiple metrics, including accuracy, F1 score, Brier score, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic curve, calibration curve, and decision curve analysis curve. For the selected optimal model, the decision-making process was interpreted using the SHapley Additive exPlanations (SHAP) method. The results show that the optimal RFE-XGBoost model achieved an accuracy of 0.83 (95% CI 0.76–0.90), an F1 score of 0.81 (95% CI 0.72–0.88), and an area under the receiver operating characteristic curve of 0.89 (95% CI 0.82–0.94). Furthermore, the model exhibited superior calibration and clinical utility. SHAP analysis revealed that increased perioperative transfusion quantity, higher tumor AJCC stage, elevated carcinoembryonic antigen level, elevated carbohydrate antigen 19–9 (CA19-9) level, advanced age, and elevated carbohydrate antigen 125 (CA125) level were correlated with increased individual mortality risk. The RFE-XGBoost model demonstrated excellent performance in predicting CRC patient prognosis, and the application of the SHAP method bolstered the model’s credibility and utility. |
format | Article |
id | doaj-art-a4d1e4768908402b8cf99ca58e4e480d |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-a4d1e4768908402b8cf99ca58e4e480d2025-01-26T12:26:55ZengNature PortfolioScientific Reports2045-23222025-01-0115111010.1038/s41598-025-86872-5Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up dataBoao Xiao0Min Yang1Yao Meng2Weimin Wang3Yuan Chen4Chenglong Yu5Longlong Bai6Lishun Xiao7Yansu Chen8School of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversityDepartment of Oncology, Yixing Hospital Affiliated to Medical College of Yangzhou UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversityAbstract Colorectal cancer (CRC) is a prevalent malignant tumor that presents significant challenges to both public health and healthcare systems. The aim of this study was to develop a machine learning model based on five years of clinical follow-up data from CRC patients to accurately identify individuals at risk of poor prognosis. This study included 411 CRC patients who underwent surgery at Yixing Hospital and completed the follow-up process. A modeling dataset containing 73 characteristic variables was established by collecting demographic information, clinical blood test indicators, histopathological results, and additional treatment-related information. Decision tree, random forest, support vector machine, and extreme gradient boosting (XGBoost) models were selected for modeling based on the features identified through recursive feature elimination (RFE). The Cox proportional hazards model was used as the baseline for model comparison. During the model training process, hyperparameters were optimized using a grid search method. The model performance was comprehensively assessed using multiple metrics, including accuracy, F1 score, Brier score, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic curve, calibration curve, and decision curve analysis curve. For the selected optimal model, the decision-making process was interpreted using the SHapley Additive exPlanations (SHAP) method. The results show that the optimal RFE-XGBoost model achieved an accuracy of 0.83 (95% CI 0.76–0.90), an F1 score of 0.81 (95% CI 0.72–0.88), and an area under the receiver operating characteristic curve of 0.89 (95% CI 0.82–0.94). Furthermore, the model exhibited superior calibration and clinical utility. SHAP analysis revealed that increased perioperative transfusion quantity, higher tumor AJCC stage, elevated carcinoembryonic antigen level, elevated carbohydrate antigen 19–9 (CA19-9) level, advanced age, and elevated carbohydrate antigen 125 (CA125) level were correlated with increased individual mortality risk. The RFE-XGBoost model demonstrated excellent performance in predicting CRC patient prognosis, and the application of the SHAP method bolstered the model’s credibility and utility.https://doi.org/10.1038/s41598-025-86872-5Colorectal cancerMachine learningPrognosisFollow-up studiesRisk factors |
spellingShingle | Boao Xiao Min Yang Yao Meng Weimin Wang Yuan Chen Chenglong Yu Longlong Bai Lishun Xiao Yansu Chen Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data Scientific Reports Colorectal cancer Machine learning Prognosis Follow-up studies Risk factors |
title | Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data |
title_full | Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data |
title_fullStr | Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data |
title_full_unstemmed | Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data |
title_short | Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data |
title_sort | construction of a prognostic prediction model for colorectal cancer based on 5 year clinical follow up data |
topic | Colorectal cancer Machine learning Prognosis Follow-up studies Risk factors |
url | https://doi.org/10.1038/s41598-025-86872-5 |
work_keys_str_mv | AT boaoxiao constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata AT minyang constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata AT yaomeng constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata AT weiminwang constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata AT yuanchen constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata AT chenglongyu constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata AT longlongbai constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata AT lishunxiao constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata AT yansuchen constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata |