Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer
This study explored the application of explainable machine learning models to enhance breast cancer diagnosis using serum biomarkers, contrary to many studies that focus on medical images and demographic data. The primary objective was to develop models that are not only accurate but also provide in...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-01-01
|
| Series: | Life |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2075-1729/15/2/211 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850238779256733696 |
|---|---|
| author | José Manuel Martínez-Ramírez Cristobal Carmona María Jesús Ramírez-Expósito José Manuel Martínez-Martos |
| author_facet | José Manuel Martínez-Ramírez Cristobal Carmona María Jesús Ramírez-Expósito José Manuel Martínez-Martos |
| author_sort | José Manuel Martínez-Ramírez |
| collection | DOAJ |
| description | This study explored the application of explainable machine learning models to enhance breast cancer diagnosis using serum biomarkers, contrary to many studies that focus on medical images and demographic data. The primary objective was to develop models that are not only accurate but also provide insights into the factors driving predictions, addressing the need for trustworthy AI in healthcare. Several classification models were evaluated, including OneR, JRIP, the FURIA, J48, the ADTree, and the Random Forest, all of which are known for their explainability. The dataset included a variety of biomarkers, such as electrolytes, metal ions, marker proteins, enzymes, lipid profiles, peptide hormones, steroid hormones, and hormone receptors. The Random Forest model achieved the highest accuracy at 99.401%, followed closely by JRIP, the FURIA, and the ADTree at 98.802%. OneR and J48 achieved 98.204% accuracy. Notably, the models identified oxytocin as a key predictive biomarker, with most models featuring it in their rules. Other significant parameters included GnRH, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>β</mi></semantics></math></inline-formula>-endorphin, vasopressin, IRAP, and APB, as well as factors like iron, cholinesterase, the total protein, progesterone, 5-nucleotidase, and the BMI, which are considered clinically relevant to breast cancer pathogenesis. This study discusses the roles of the identified parameters in cancer development, thus underscoring the potential of explainable machine learning models for enhancing early breast cancer diagnosis by focusing on explainability and the use of serum biomarkers.The combination of both can lead to improved early detection and personalized treatments, emphasizing the potential of these methods in clinical settings. The identified markers also provide additional research and therapeutic targets for breast cancer pathogenesis and a deep understanding of their interactions, advancing personalized approaches to breast cancer management. |
| format | Article |
| id | doaj-art-d27f79890c804dfc9d72c24c0c20a928 |
| institution | OA Journals |
| issn | 2075-1729 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Life |
| spelling | doaj-art-d27f79890c804dfc9d72c24c0c20a9282025-08-20T02:01:21ZengMDPI AGLife2075-17292025-01-0115221110.3390/life15020211Extracting Knowledge from Machine Learning Models to Diagnose Breast CancerJosé Manuel Martínez-Ramírez0Cristobal Carmona1María Jesús Ramírez-Expósito2José Manuel Martínez-Martos3Department of Computer Science, University of Jaén, E-23071 Jaén, SpainDepartment of Computer Science, University of Jaén, E-23071 Jaén, SpainExperimental and Clinical Physiopathology Research Group CVI-1039, Department of Health Sciences, University of Jaén, E-23071 Jaén, SpainExperimental and Clinical Physiopathology Research Group CVI-1039, Department of Health Sciences, University of Jaén, E-23071 Jaén, SpainThis study explored the application of explainable machine learning models to enhance breast cancer diagnosis using serum biomarkers, contrary to many studies that focus on medical images and demographic data. The primary objective was to develop models that are not only accurate but also provide insights into the factors driving predictions, addressing the need for trustworthy AI in healthcare. Several classification models were evaluated, including OneR, JRIP, the FURIA, J48, the ADTree, and the Random Forest, all of which are known for their explainability. The dataset included a variety of biomarkers, such as electrolytes, metal ions, marker proteins, enzymes, lipid profiles, peptide hormones, steroid hormones, and hormone receptors. The Random Forest model achieved the highest accuracy at 99.401%, followed closely by JRIP, the FURIA, and the ADTree at 98.802%. OneR and J48 achieved 98.204% accuracy. Notably, the models identified oxytocin as a key predictive biomarker, with most models featuring it in their rules. Other significant parameters included GnRH, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>β</mi></semantics></math></inline-formula>-endorphin, vasopressin, IRAP, and APB, as well as factors like iron, cholinesterase, the total protein, progesterone, 5-nucleotidase, and the BMI, which are considered clinically relevant to breast cancer pathogenesis. This study discusses the roles of the identified parameters in cancer development, thus underscoring the potential of explainable machine learning models for enhancing early breast cancer diagnosis by focusing on explainability and the use of serum biomarkers.The combination of both can lead to improved early detection and personalized treatments, emphasizing the potential of these methods in clinical settings. The identified markers also provide additional research and therapeutic targets for breast cancer pathogenesis and a deep understanding of their interactions, advancing personalized approaches to breast cancer management.https://www.mdpi.com/2075-1729/15/2/211breast cancerserum biomarkersexplainable AIoxytocinearly diagnosispeptide hormones |
| spellingShingle | José Manuel Martínez-Ramírez Cristobal Carmona María Jesús Ramírez-Expósito José Manuel Martínez-Martos Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer Life breast cancer serum biomarkers explainable AI oxytocin early diagnosis peptide hormones |
| title | Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer |
| title_full | Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer |
| title_fullStr | Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer |
| title_full_unstemmed | Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer |
| title_short | Extracting Knowledge from Machine Learning Models to Diagnose Breast Cancer |
| title_sort | extracting knowledge from machine learning models to diagnose breast cancer |
| topic | breast cancer serum biomarkers explainable AI oxytocin early diagnosis peptide hormones |
| url | https://www.mdpi.com/2075-1729/15/2/211 |
| work_keys_str_mv | AT josemanuelmartinezramirez extractingknowledgefrommachinelearningmodelstodiagnosebreastcancer AT cristobalcarmona extractingknowledgefrommachinelearningmodelstodiagnosebreastcancer AT mariajesusramirezexposito extractingknowledgefrommachinelearningmodelstodiagnosebreastcancer AT josemanuelmartinezmartos extractingknowledgefrommachinelearningmodelstodiagnosebreastcancer |