Unlocking The Potential of Hybrid Models for Prognostic Biomarker Discovery in Oral Cancer Survival Analysis: A Retrospective Cohort Study

Objective: This study aimed to develop a hybrid model for variable selection in high-dimensional survival analysis using a support vector regression (SVR), to identify prognostic biomarkers associated with survival in oral cancer (OC) patients through the analysis of gene expression data.Materials a...

Full description

Saved in:
Bibliographic Details
Main Authors: Leila Nezamabadi Farahani, Anoshirvan Kazemnejad, Mahlagha Afrasiabi, Leili Tapak
Format: Article
Language:English
Published: Royan Institute (ACECR), Tehran 2024-12-01
Series:Cell Journal
Subjects:
Online Access:https://www.celljournal.org/article_724800_d41d8cd98f00b204e9800998ecf8427e.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective: This study aimed to develop a hybrid model for variable selection in high-dimensional survival analysis using a support vector regression (SVR), to identify prognostic biomarkers associated with survival in oral cancer (OC) patients through the analysis of gene expression data.Materials and Methods: In this retrospective cohort study, gene expression profiles (54,613 probes) related to 97 patients from the GSE41613 dataset from the GEO repository were used. First of all, martingale residuals were obtained using a Cox regression without covariates, and were used as pseudo-survival outcome. Then, the particle swarm optimization (PSO) and genetic algorithm (GA) were used in combination with SVR for selecting features related to pseudo-survival outcome. Concordance index (C-index), mean absolute error (MAE), mean squared error (MSE) and R-squares, were used to evaluate the performance of the models using selected features. Functional enrichment analysis was performed using DAVID database, and external validation utilized three independent datasets (GSE9844, GSE75538, GSE37991, GSE42743).Results: The findings indicated that the PSO-based method outperformed the GA-based method, achieving a smaller MAE (0.061) and MSE (0.005), R-square (0.99) and C-index (0.973), selecting 291 probes from 1069 screened. A protein-protein interaction (PPI) network was constructed, including 200 nodes and 120 edges. Eleven key genes with the highest degree, including RBM25, SMC3, PRPF40A, POLE, SRRT, BCLAF1, PDS5B, HNRNPR, JAK1, MED23, and SULT1A1 were identified as significant biomarkers associated with OC survival.Conclusion: The PSO-based hybrid model effectively improved SVR performance in survival prediction for OC patients and identified key prognostic biomarkers. Despite its promising results and validation on independent datasets, limitations in generalizability and signs of overfitting suggest the model is not yet ready for clinical use. Further studies with larger, diverse datasets are recommended.
ISSN:2228-5806
2228-5814