Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature Selection
Objectives: Ovarian cancer continues to be one of the most prevalent gynecological cancers diagnosed. Early detection is highly critical for increasing survival chances. This research aims to assess the feature extraction process from various machine learning techniques for better modelling of ovari...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Life |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2075-1729/15/4/594 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850144739299426304 |
|---|---|
| author | Tuğçe Öznacar Tunç Güler |
| author_facet | Tuğçe Öznacar Tunç Güler |
| author_sort | Tuğçe Öznacar |
| collection | DOAJ |
| description | Objectives: Ovarian cancer continues to be one of the most prevalent gynecological cancers diagnosed. Early detection is highly critical for increasing survival chances. This research aims to assess the feature extraction process from various machine learning techniques for better modelling of ovarian cancer and the selection process in ovarian cancer analysis. By eliminating irrelevant features, this approach could guide clinicians towards more accurate results and optimize diagnostic precision. Methods: This study included both patients with and without ovarian cancer, creating a dataset containing 50 independent variables/features. Eight machine learning algorithms: Random Forest, XGBoost, CatBoost, Decision Tree, K-Nearest Neighbors, Naive Bayes, Gradient Boosting, and Support Vector Machine, were evaluated alongside four feature selection techniques: Boruta, PCA, RFE, and MI. Metrics performance has been evaluated to obtain the best possible combination for diagnosis. Results: These results were obtained using these methods with a significantly reduced number of features. Random Forest and CatBoost’s performances demonstrated significant differences in contrast to other algorithms (respectively, AUC 0.94% and 0.95%). On the other hand, feature selection methods such as Boruta and RFE consistently reflected higher AUC and accuracy scores than the others. Conclusions: This study highlights the importance of choosing appropriate machine learning algorithms and feature selection techniques for ovarian cancer diagnosis. Boruta and RFE showed high accuracy. By reducing the number of features from 50 to the most relevant ones, clinicians can make more precise diagnoses, enhance patient outcomes, and reduce unnecessary tests. |
| format | Article |
| id | doaj-art-e369e178bd3a4bacb6f1d9fdcb7e88e2 |
| institution | OA Journals |
| issn | 2075-1729 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Life |
| spelling | doaj-art-e369e178bd3a4bacb6f1d9fdcb7e88e22025-08-20T02:28:15ZengMDPI AGLife2075-17292025-04-0115459410.3390/life15040594Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature SelectionTuğçe Öznacar0Tunç Güler1Department of Biostatistics, Ankara Medipol University, Ankara 06570, TurkeyDepartment of Medical Oncology, Park Hayat Hospital, Afyonkarahisar 03100, TurkeyObjectives: Ovarian cancer continues to be one of the most prevalent gynecological cancers diagnosed. Early detection is highly critical for increasing survival chances. This research aims to assess the feature extraction process from various machine learning techniques for better modelling of ovarian cancer and the selection process in ovarian cancer analysis. By eliminating irrelevant features, this approach could guide clinicians towards more accurate results and optimize diagnostic precision. Methods: This study included both patients with and without ovarian cancer, creating a dataset containing 50 independent variables/features. Eight machine learning algorithms: Random Forest, XGBoost, CatBoost, Decision Tree, K-Nearest Neighbors, Naive Bayes, Gradient Boosting, and Support Vector Machine, were evaluated alongside four feature selection techniques: Boruta, PCA, RFE, and MI. Metrics performance has been evaluated to obtain the best possible combination for diagnosis. Results: These results were obtained using these methods with a significantly reduced number of features. Random Forest and CatBoost’s performances demonstrated significant differences in contrast to other algorithms (respectively, AUC 0.94% and 0.95%). On the other hand, feature selection methods such as Boruta and RFE consistently reflected higher AUC and accuracy scores than the others. Conclusions: This study highlights the importance of choosing appropriate machine learning algorithms and feature selection techniques for ovarian cancer diagnosis. Boruta and RFE showed high accuracy. By reducing the number of features from 50 to the most relevant ones, clinicians can make more precise diagnoses, enhance patient outcomes, and reduce unnecessary tests.https://www.mdpi.com/2075-1729/15/4/594feature selectionmachine learningBorutarecursive feature eliminationCatBoost |
| spellingShingle | Tuğçe Öznacar Tunç Güler Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature Selection Life feature selection machine learning Boruta recursive feature elimination CatBoost |
| title | Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature Selection |
| title_full | Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature Selection |
| title_fullStr | Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature Selection |
| title_full_unstemmed | Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature Selection |
| title_short | Prediction of Early Diagnosis in Ovarian Cancer Patients Using Machine Learning Approaches with Boruta and Advanced Feature Selection |
| title_sort | prediction of early diagnosis in ovarian cancer patients using machine learning approaches with boruta and advanced feature selection |
| topic | feature selection machine learning Boruta recursive feature elimination CatBoost |
| url | https://www.mdpi.com/2075-1729/15/4/594 |
| work_keys_str_mv | AT tugceoznacar predictionofearlydiagnosisinovariancancerpatientsusingmachinelearningapproacheswithborutaandadvancedfeatureselection AT tuncguler predictionofearlydiagnosisinovariancancerpatientsusingmachinelearningapproacheswithborutaandadvancedfeatureselection |