Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data

Abstract Colorectal cancer (CRC) is a prevalent malignant tumor that presents significant challenges to both public health and healthcare systems. The aim of this study was to develop a machine learning model based on five years of clinical follow-up data from CRC patients to accurately identify ind...

Full description

Saved in:
Bibliographic Details
Main Authors: Boao Xiao, Min Yang, Yao Meng, Weimin Wang, Yuan Chen, Chenglong Yu, Longlong Bai, Lishun Xiao, Yansu Chen
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-86872-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585862392578048
author Boao Xiao
Min Yang
Yao Meng
Weimin Wang
Yuan Chen
Chenglong Yu
Longlong Bai
Lishun Xiao
Yansu Chen
author_facet Boao Xiao
Min Yang
Yao Meng
Weimin Wang
Yuan Chen
Chenglong Yu
Longlong Bai
Lishun Xiao
Yansu Chen
author_sort Boao Xiao
collection DOAJ
description Abstract Colorectal cancer (CRC) is a prevalent malignant tumor that presents significant challenges to both public health and healthcare systems. The aim of this study was to develop a machine learning model based on five years of clinical follow-up data from CRC patients to accurately identify individuals at risk of poor prognosis. This study included 411 CRC patients who underwent surgery at Yixing Hospital and completed the follow-up process. A modeling dataset containing 73 characteristic variables was established by collecting demographic information, clinical blood test indicators, histopathological results, and additional treatment-related information. Decision tree, random forest, support vector machine, and extreme gradient boosting (XGBoost) models were selected for modeling based on the features identified through recursive feature elimination (RFE). The Cox proportional hazards model was used as the baseline for model comparison. During the model training process, hyperparameters were optimized using a grid search method. The model performance was comprehensively assessed using multiple metrics, including accuracy, F1 score, Brier score, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic curve, calibration curve, and decision curve analysis curve. For the selected optimal model, the decision-making process was interpreted using the SHapley Additive exPlanations (SHAP) method. The results show that the optimal RFE-XGBoost model achieved an accuracy of 0.83 (95% CI 0.76–0.90), an F1 score of 0.81 (95% CI 0.72–0.88), and an area under the receiver operating characteristic curve of 0.89 (95% CI 0.82–0.94). Furthermore, the model exhibited superior calibration and clinical utility. SHAP analysis revealed that increased perioperative transfusion quantity, higher tumor AJCC stage, elevated carcinoembryonic antigen level, elevated carbohydrate antigen 19–9 (CA19-9) level, advanced age, and elevated carbohydrate antigen 125 (CA125) level were correlated with increased individual mortality risk. The RFE-XGBoost model demonstrated excellent performance in predicting CRC patient prognosis, and the application of the SHAP method bolstered the model’s credibility and utility.
format Article
id doaj-art-a4d1e4768908402b8cf99ca58e4e480d
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-a4d1e4768908402b8cf99ca58e4e480d2025-01-26T12:26:55ZengNature PortfolioScientific Reports2045-23222025-01-0115111010.1038/s41598-025-86872-5Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up dataBoao Xiao0Min Yang1Yao Meng2Weimin Wang3Yuan Chen4Chenglong Yu5Longlong Bai6Lishun Xiao7Yansu Chen8School of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversityDepartment of Oncology, Yixing Hospital Affiliated to Medical College of Yangzhou UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversitySchool of Public Health, Xuzhou Medical UniversityAbstract Colorectal cancer (CRC) is a prevalent malignant tumor that presents significant challenges to both public health and healthcare systems. The aim of this study was to develop a machine learning model based on five years of clinical follow-up data from CRC patients to accurately identify individuals at risk of poor prognosis. This study included 411 CRC patients who underwent surgery at Yixing Hospital and completed the follow-up process. A modeling dataset containing 73 characteristic variables was established by collecting demographic information, clinical blood test indicators, histopathological results, and additional treatment-related information. Decision tree, random forest, support vector machine, and extreme gradient boosting (XGBoost) models were selected for modeling based on the features identified through recursive feature elimination (RFE). The Cox proportional hazards model was used as the baseline for model comparison. During the model training process, hyperparameters were optimized using a grid search method. The model performance was comprehensively assessed using multiple metrics, including accuracy, F1 score, Brier score, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic curve, calibration curve, and decision curve analysis curve. For the selected optimal model, the decision-making process was interpreted using the SHapley Additive exPlanations (SHAP) method. The results show that the optimal RFE-XGBoost model achieved an accuracy of 0.83 (95% CI 0.76–0.90), an F1 score of 0.81 (95% CI 0.72–0.88), and an area under the receiver operating characteristic curve of 0.89 (95% CI 0.82–0.94). Furthermore, the model exhibited superior calibration and clinical utility. SHAP analysis revealed that increased perioperative transfusion quantity, higher tumor AJCC stage, elevated carcinoembryonic antigen level, elevated carbohydrate antigen 19–9 (CA19-9) level, advanced age, and elevated carbohydrate antigen 125 (CA125) level were correlated with increased individual mortality risk. The RFE-XGBoost model demonstrated excellent performance in predicting CRC patient prognosis, and the application of the SHAP method bolstered the model’s credibility and utility.https://doi.org/10.1038/s41598-025-86872-5Colorectal cancerMachine learningPrognosisFollow-up studiesRisk factors
spellingShingle Boao Xiao
Min Yang
Yao Meng
Weimin Wang
Yuan Chen
Chenglong Yu
Longlong Bai
Lishun Xiao
Yansu Chen
Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data
Scientific Reports
Colorectal cancer
Machine learning
Prognosis
Follow-up studies
Risk factors
title Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data
title_full Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data
title_fullStr Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data
title_full_unstemmed Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data
title_short Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data
title_sort construction of a prognostic prediction model for colorectal cancer based on 5 year clinical follow up data
topic Colorectal cancer
Machine learning
Prognosis
Follow-up studies
Risk factors
url https://doi.org/10.1038/s41598-025-86872-5
work_keys_str_mv AT boaoxiao constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata
AT minyang constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata
AT yaomeng constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata
AT weiminwang constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata
AT yuanchen constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata
AT chenglongyu constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata
AT longlongbai constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata
AT lishunxiao constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata
AT yansuchen constructionofaprognosticpredictionmodelforcolorectalcancerbasedon5yearclinicalfollowupdata