Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis

Abstract Breast cancer, with its high incidence and mortality globally, necessitates early prediction of local and distant recurrence to improve treatment outcomes. This study develops and validates predictive models for breast cancer recurrence and metastasis using Recurrence-Free Survival Analysis...

Full description

Saved in:
Bibliographic Details
Main Authors: Shahd M. Noman, Youssef M. Fadel, Mayar T. Henedak, Nada A. Attia, Malak Essam, Sarah Elmaasarawii, Fayrouz A. Fouad, Esraa G. Eltasawi, Walid Al-Atabany
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-87622-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571641424510976
author Shahd M. Noman
Youssef M. Fadel
Mayar T. Henedak
Nada A. Attia
Malak Essam
Sarah Elmaasarawii
Fayrouz A. Fouad
Esraa G. Eltasawi
Walid Al-Atabany
author_facet Shahd M. Noman
Youssef M. Fadel
Mayar T. Henedak
Nada A. Attia
Malak Essam
Sarah Elmaasarawii
Fayrouz A. Fouad
Esraa G. Eltasawi
Walid Al-Atabany
author_sort Shahd M. Noman
collection DOAJ
description Abstract Breast cancer, with its high incidence and mortality globally, necessitates early prediction of local and distant recurrence to improve treatment outcomes. This study develops and validates predictive models for breast cancer recurrence and metastasis using Recurrence-Free Survival Analysis and machine learning techniques. We merged datasets from the Molecular Taxonomy of Breast Cancer International Consortium, Memorial Sloan Kettering Cancer Center, Duke University, and the SEER program, creating a comprehensive dataset of 272, 252 rows and 23 columns. Our methodology utilized three predictive strategies: assessing recurrence risk, differentiating local from distant recurrences, and identifying potential metastatic sites. Key prognostic factors were identified through survival analysis. LightGBM, XGBoost, and Random Forest models were employed and validated against data from the Baheya Foundation. The models demonstrated strong performance; the survival analysis achieved a C-index of 0.837. The LightGBM model reached an AUC of 92% in predicting recurrences, while XGBoost and Random Forest models distinguished recurrence types with up to 86% accuracy, and they effectively differentiated between bone metastasis and all other locations combined (brain, liver, and lungs). This study highlights the significant potential of machine learning in advancing breast cancer management and sets a new benchmark for predictive analytics. Future research will integrate genetic data to further enhance these models.
format Article
id doaj-art-2ed292d836ed4cd0b96788a9849b1353
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-2ed292d836ed4cd0b96788a9849b13532025-02-02T12:24:24ZengNature PortfolioScientific Reports2045-23222025-01-0115111610.1038/s41598-025-87622-3Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasisShahd M. Noman0Youssef M. Fadel1Mayar T. Henedak2Nada A. Attia3Malak Essam4Sarah Elmaasarawii5Fayrouz A. Fouad6Esraa G. Eltasawi7Walid Al-Atabany8Center for Informatics Science (CIS), School of Information Technology and Computer Science, Nile University, 26th of July CorridorCenter for Informatics Science (CIS), School of Information Technology and Computer Science, Nile University, 26th of July CorridorCenter for Informatics Science (CIS), School of Information Technology and Computer Science, Nile University, 26th of July CorridorCenter for Informatics Science (CIS), School of Information Technology and Computer Science, Nile University, 26th of July CorridorCenter for Informatics Science (CIS), School of Information Technology and Computer Science, Nile University, 26th of July CorridorCenter for Informatics Science (CIS), School of Information Technology and Computer Science, Nile University, 26th of July CorridorBaheya Center for Early Detection and Treatment of Breast Cancer, Research CenterBaheya Center for Early Detection and Treatment of Breast Cancer, Research CenterCenter for Informatics Science (CIS), School of Information Technology and Computer Science, Nile University, 26th of July CorridorAbstract Breast cancer, with its high incidence and mortality globally, necessitates early prediction of local and distant recurrence to improve treatment outcomes. This study develops and validates predictive models for breast cancer recurrence and metastasis using Recurrence-Free Survival Analysis and machine learning techniques. We merged datasets from the Molecular Taxonomy of Breast Cancer International Consortium, Memorial Sloan Kettering Cancer Center, Duke University, and the SEER program, creating a comprehensive dataset of 272, 252 rows and 23 columns. Our methodology utilized three predictive strategies: assessing recurrence risk, differentiating local from distant recurrences, and identifying potential metastatic sites. Key prognostic factors were identified through survival analysis. LightGBM, XGBoost, and Random Forest models were employed and validated against data from the Baheya Foundation. The models demonstrated strong performance; the survival analysis achieved a C-index of 0.837. The LightGBM model reached an AUC of 92% in predicting recurrences, while XGBoost and Random Forest models distinguished recurrence types with up to 86% accuracy, and they effectively differentiated between bone metastasis and all other locations combined (brain, liver, and lungs). This study highlights the significant potential of machine learning in advancing breast cancer management and sets a new benchmark for predictive analytics. Future research will integrate genetic data to further enhance these models.https://doi.org/10.1038/s41598-025-87622-3Breast cancerRecurrence predictionMachine learningMetastasisSurvival analysis
spellingShingle Shahd M. Noman
Youssef M. Fadel
Mayar T. Henedak
Nada A. Attia
Malak Essam
Sarah Elmaasarawii
Fayrouz A. Fouad
Esraa G. Eltasawi
Walid Al-Atabany
Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis
Scientific Reports
Breast cancer
Recurrence prediction
Machine learning
Metastasis
Survival analysis
title Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis
title_full Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis
title_fullStr Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis
title_full_unstemmed Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis
title_short Leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis
title_sort leveraging survival analysis and machine learning for accurate prediction of breast cancer recurrence and metastasis
topic Breast cancer
Recurrence prediction
Machine learning
Metastasis
Survival analysis
url https://doi.org/10.1038/s41598-025-87622-3
work_keys_str_mv AT shahdmnoman leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis
AT youssefmfadel leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis
AT mayarthenedak leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis
AT nadaaattia leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis
AT malakessam leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis
AT sarahelmaasarawii leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis
AT fayrouzafouad leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis
AT esraageltasawi leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis
AT walidalatabany leveragingsurvivalanalysisandmachinelearningforaccuratepredictionofbreastcancerrecurrenceandmetastasis