Machine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic models

Abstract The objective of this study was to employ machine learning to identify shared differentially expressed genes (DEGs) in prostate cancer (PCa) initiation and castration resistance, aiming to establish a robust prognostic model and enhance understanding of patient prognosis for personalized tr...

Full description

Saved in:
Bibliographic Details
Main Authors: Changhui Fan, Zhiheng Huang, Han Xu, Tianhe Zhang, Haiyang Wei, Junfeng Gao, Changbao Xu
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-90444-y
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The objective of this study was to employ machine learning to identify shared differentially expressed genes (DEGs) in prostate cancer (PCa) initiation and castration resistance, aiming to establish a robust prognostic model and enhance understanding of patient prognosis for personalized treatment strategies. mRNA transcriptome data associated with Castration-Resistant Prostate Cancer (CRPC) were obtained from the GEO database. Differential expression analysis was conducted using the limma R package to compare normal prostate samples with PCa samples, and PCa samples with CRPC samples. Next, we applied LASSO regression, univariate, and multivariate COX regression analyses to pinpoint genes linked to prognosis and build prognostic models. Validation was performed using the TCGA_PRAD dataset to confirm expression differences of hub genes and explore their correlation with clinical variables and prognostic significance. We successfully established a prostate cancer risk prognostic model containing seven genes (KIF4A, UBE2C, FAM72D, CCDC78, HOXD9, LIX1 and SLC5A8) and verified its accuracy on an independent data set. The results of calibration curve and decision curve show that the model has potential clinical application value. The nomogram can accurately predict the prognosis of patients. Additionally, elevated expression of KIF4A, UBE2C, and FAM72D, or reduced expression of LIX1, correlated with advanced pathological T and N stages, clinical T stage, prostate-specific antigen (PSA) level, age at diagnosis, Gleason score, and shorter progression-free interval (PFI) (P < 0.05). By integrating bioinformatics analysis and clinical data, we not only established a reliable prognostic model for prostate cancer but also identified key genes pivotal in disease progression and treatment resistance. These findings provide novel insights and methodologies for assessing prognosis and tailoring treatment strategies for prostate cancer patients.
ISSN:2045-2322