Consistency and Stability in Feature Selection for High-Dimensional Microarray Survival Data in Diffuse Large B-Cell Lymphoma Cancer
High-dimensional survival data, such as microarray datasets, present significant challenges in variable selection and model performance due to their complexity and dimensionality. Identifying important genes and understanding how these genes influence the survival of patients with cancer are of grea...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Data |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2306-5729/10/2/26 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | High-dimensional survival data, such as microarray datasets, present significant challenges in variable selection and model performance due to their complexity and dimensionality. Identifying important genes and understanding how these genes influence the survival of patients with cancer are of great interest and a major challenge to biomedical scientists, healthcare practitioners, and oncologists. Therefore, this study combined the strengths of two complementary feature selection methodologies: a filtering (correlation-based) approach and a wrapper method based on Iterative Bayesian Model Averaging (IBMA). This new approach, termed Correlation-Based IBMA, offers a highly efficient and effective means of selecting the most important and influential genes for predicting the survival of patients with cancer. The efficiency and consistency of the method were demonstrated using diffuse large B-cell lymphoma cancer data. The results revealed that the 15 most important genes out of 3835 gene features were consistently selected at a threshold <i>p</i>-value of 0.001, with genes with posterior probabilities below 1% being removed. The influence of these 15 genes on patient survival was assessed using the Cox Proportional Hazards (Cox-PH) Model. The results further revealed that eight genes were highly associated with patient survival at a 0.05 level of significance. Finally, these findings underscore the importance of integrating feature selection with robust modeling approaches to enhance accuracy and interpretability in high-dimensional survival data analysis. |
|---|---|
| ISSN: | 2306-5729 |