Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning

Abstract Predicting treatment response is an important problem in real-world applications, where the heterogeneity of the treatment response remains a significant challenge in practice. Unsupervised machine learning methods have been proposed to address this challenge by clustering patients with sim...

Full description

Saved in:
Bibliographic Details
Main Authors: Weishen Pan, Deep Hathi, Zhenxing Xu, Qiannan Zhang, Ying Li, Fei Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-59092-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850273009825218560
author Weishen Pan
Deep Hathi
Zhenxing Xu
Qiannan Zhang
Ying Li
Fei Wang
author_facet Weishen Pan
Deep Hathi
Zhenxing Xu
Qiannan Zhang
Ying Li
Fei Wang
author_sort Weishen Pan
collection DOAJ
description Abstract Predicting treatment response is an important problem in real-world applications, where the heterogeneity of the treatment response remains a significant challenge in practice. Unsupervised machine learning methods have been proposed to address this challenge by clustering patients with similar electronic health record (EHR) data. However, they cannot guarantee coherent outcomes within the groups. Here, we propose Graph-Encoded Mixture Survival (GEMS) as a general machine learning framework to identify distinct predictive subphenotypes that guarantee coherent survival and baseline characteristics within each subphenotype. We apply our method to a real-world dataset of advanced non-small cell lung cancer (aNSCLC) patients receiving first-line immune checkpoint inhibitor (ICI) therapy to predict overall survival (OS). Our method outperforms baseline methods for predicting OS and identifies three reproducible subphenotypes associated with distinct baseline clinical characteristics and OS. Our results demonstrate that our method can provide insights in the heterogeneity of treatment response and potentially influence treatment selection.
format Article
id doaj-art-4640bfc9190d40a7bf3e8b3beeae8a4e
institution OA Journals
issn 2041-1723
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-4640bfc9190d40a7bf3e8b3beeae8a4e2025-08-20T01:51:38ZengNature PortfolioNature Communications2041-17232025-05-0116111410.1038/s41467-025-59092-8Identification of predictive subphenotypes for clinical outcomes using real world data and machine learningWeishen Pan0Deep Hathi1Zhenxing Xu2Qiannan Zhang3Ying Li4Fei Wang5Department of Population Health Sciences, Weill Cornell Medicine, Cornell UniversityRegeneron Pharmaceuticals, Inc.Department of Population Health Sciences, Weill Cornell Medicine, Cornell UniversityDepartment of Population Health Sciences, Weill Cornell Medicine, Cornell UniversityRegeneron Pharmaceuticals, Inc.Department of Population Health Sciences, Weill Cornell Medicine, Cornell UniversityAbstract Predicting treatment response is an important problem in real-world applications, where the heterogeneity of the treatment response remains a significant challenge in practice. Unsupervised machine learning methods have been proposed to address this challenge by clustering patients with similar electronic health record (EHR) data. However, they cannot guarantee coherent outcomes within the groups. Here, we propose Graph-Encoded Mixture Survival (GEMS) as a general machine learning framework to identify distinct predictive subphenotypes that guarantee coherent survival and baseline characteristics within each subphenotype. We apply our method to a real-world dataset of advanced non-small cell lung cancer (aNSCLC) patients receiving first-line immune checkpoint inhibitor (ICI) therapy to predict overall survival (OS). Our method outperforms baseline methods for predicting OS and identifies three reproducible subphenotypes associated with distinct baseline clinical characteristics and OS. Our results demonstrate that our method can provide insights in the heterogeneity of treatment response and potentially influence treatment selection.https://doi.org/10.1038/s41467-025-59092-8
spellingShingle Weishen Pan
Deep Hathi
Zhenxing Xu
Qiannan Zhang
Ying Li
Fei Wang
Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning
Nature Communications
title Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning
title_full Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning
title_fullStr Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning
title_full_unstemmed Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning
title_short Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning
title_sort identification of predictive subphenotypes for clinical outcomes using real world data and machine learning
url https://doi.org/10.1038/s41467-025-59092-8
work_keys_str_mv AT weishenpan identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning
AT deephathi identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning
AT zhenxingxu identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning
AT qiannanzhang identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning
AT yingli identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning
AT feiwang identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning