Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer

This study explored the application of explainable machine learning models to enhance breast cancer diagnosis using serum biomarkers, contrary to many studies that focus on medical images and demographic data. The primary objective was to develop models that are not only accurate but also provide in...

Full description

Saved in:
Bibliographic Details
Main Authors: José Manuel Martínez-Ramírez, Cristobal Carmona, María Jesús Ramírez-Expósito, José Manuel Martínez-Martos
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Life
Subjects:
Online Access:https://www.mdpi.com/2075-1729/15/2/211
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850238779256733696
author José Manuel Martínez-Ramírez
Cristobal Carmona
María Jesús Ramírez-Expósito
José Manuel Martínez-Martos
author_facet José Manuel Martínez-Ramírez
Cristobal Carmona
María Jesús Ramírez-Expósito
José Manuel Martínez-Martos
author_sort José Manuel Martínez-Ramírez
collection DOAJ
description This study explored the application of explainable machine learning models to enhance breast cancer diagnosis using serum biomarkers, contrary to many studies that focus on medical images and demographic data. The primary objective was to develop models that are not only accurate but also provide insights into the factors driving predictions, addressing the need for trustworthy AI in healthcare. Several classification models were evaluated, including OneR, JRIP, the FURIA, J48, the ADTree, and the Random Forest, all of which are known for their explainability. The dataset included a variety of biomarkers, such as electrolytes, metal ions, marker proteins, enzymes, lipid profiles, peptide hormones, steroid hormones, and hormone receptors. The Random Forest model achieved the highest accuracy at 99.401%, followed closely by JRIP, the FURIA, and the ADTree at 98.802%. OneR and J48 achieved 98.204% accuracy. Notably, the models identified oxytocin as a key predictive biomarker, with most models featuring it in their rules. Other significant parameters included GnRH, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>β</mi></semantics></math></inline-formula>-endorphin, vasopressin, IRAP, and APB, as well as factors like iron, cholinesterase, the total protein, progesterone, 5-nucleotidase, and the BMI, which are considered clinically relevant to breast cancer pathogenesis. This study discusses the roles of the identified parameters in cancer development, thus underscoring the potential of explainable machine learning models for enhancing early breast cancer diagnosis by focusing on explainability and the use of serum biomarkers.The combination of both can lead to improved early detection and personalized treatments, emphasizing the potential of these methods in clinical settings. The identified markers also provide additional research and therapeutic targets for breast cancer pathogenesis and a deep understanding of their interactions, advancing personalized approaches to breast cancer management.
format Article
id doaj-art-d27f79890c804dfc9d72c24c0c20a928
institution OA Journals
issn 2075-1729
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Life
spelling doaj-art-d27f79890c804dfc9d72c24c0c20a9282025-08-20T02:01:21ZengMDPI AGLife2075-17292025-01-0115221110.3390/life15020211Extracting Knowledge from Machine Learning Models to Diagnose Breast CancerJosé Manuel Martínez-Ramírez0Cristobal Carmona1María Jesús Ramírez-Expósito2José Manuel Martínez-Martos3Department of Computer Science, University of Jaén, E-23071 Jaén, SpainDepartment of Computer Science, University of Jaén, E-23071 Jaén, SpainExperimental and Clinical Physiopathology Research Group CVI-1039, Department of Health Sciences, University of Jaén, E-23071 Jaén, SpainExperimental and Clinical Physiopathology Research Group CVI-1039, Department of Health Sciences, University of Jaén, E-23071 Jaén, SpainThis study explored the application of explainable machine learning models to enhance breast cancer diagnosis using serum biomarkers, contrary to many studies that focus on medical images and demographic data. The primary objective was to develop models that are not only accurate but also provide insights into the factors driving predictions, addressing the need for trustworthy AI in healthcare. Several classification models were evaluated, including OneR, JRIP, the FURIA, J48, the ADTree, and the Random Forest, all of which are known for their explainability. The dataset included a variety of biomarkers, such as electrolytes, metal ions, marker proteins, enzymes, lipid profiles, peptide hormones, steroid hormones, and hormone receptors. The Random Forest model achieved the highest accuracy at 99.401%, followed closely by JRIP, the FURIA, and the ADTree at 98.802%. OneR and J48 achieved 98.204% accuracy. Notably, the models identified oxytocin as a key predictive biomarker, with most models featuring it in their rules. Other significant parameters included GnRH, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>β</mi></semantics></math></inline-formula>-endorphin, vasopressin, IRAP, and APB, as well as factors like iron, cholinesterase, the total protein, progesterone, 5-nucleotidase, and the BMI, which are considered clinically relevant to breast cancer pathogenesis. This study discusses the roles of the identified parameters in cancer development, thus underscoring the potential of explainable machine learning models for enhancing early breast cancer diagnosis by focusing on explainability and the use of serum biomarkers.The combination of both can lead to improved early detection and personalized treatments, emphasizing the potential of these methods in clinical settings. The identified markers also provide additional research and therapeutic targets for breast cancer pathogenesis and a deep understanding of their interactions, advancing personalized approaches to breast cancer management.https://www.mdpi.com/2075-1729/15/2/211breast cancerserum biomarkersexplainable AIoxytocinearly diagnosispeptide hormones
spellingShingle José Manuel Martínez-Ramírez
Cristobal Carmona
María Jesús Ramírez-Expósito
José Manuel Martínez-Martos
Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer
Life
breast cancer
serum biomarkers
explainable AI
oxytocin
early diagnosis
peptide hormones
title Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer
title_full Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer
title_fullStr Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer
title_full_unstemmed Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer
title_short Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer
title_sort extracting knowledge from machine learning models to diagnose breast cancer
topic breast cancer
serum biomarkers
explainable AI
oxytocin
early diagnosis
peptide hormones
url https://www.mdpi.com/2075-1729/15/2/211
work_keys_str_mv AT josemanuelmartinezramirez extractingknowledgefrommachinelearningmodelstodiagnosebreastcancer
AT cristobalcarmona extractingknowledgefrommachinelearningmodelstodiagnosebreastcancer
AT mariajesusramirezexposito extractingknowledgefrommachinelearningmodelstodiagnosebreastcancer
AT josemanuelmartinezmartos extractingknowledgefrommachinelearningmodelstodiagnosebreastcancer