Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning
Abstract Predicting treatment response is an important problem in real-world applications, where the heterogeneity of the treatment response remains a significant challenge in practice. Unsupervised machine learning methods have been proposed to address this challenge by clustering patients with sim...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-59092-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850273009825218560 |
|---|---|
| author | Weishen Pan Deep Hathi Zhenxing Xu Qiannan Zhang Ying Li Fei Wang |
| author_facet | Weishen Pan Deep Hathi Zhenxing Xu Qiannan Zhang Ying Li Fei Wang |
| author_sort | Weishen Pan |
| collection | DOAJ |
| description | Abstract Predicting treatment response is an important problem in real-world applications, where the heterogeneity of the treatment response remains a significant challenge in practice. Unsupervised machine learning methods have been proposed to address this challenge by clustering patients with similar electronic health record (EHR) data. However, they cannot guarantee coherent outcomes within the groups. Here, we propose Graph-Encoded Mixture Survival (GEMS) as a general machine learning framework to identify distinct predictive subphenotypes that guarantee coherent survival and baseline characteristics within each subphenotype. We apply our method to a real-world dataset of advanced non-small cell lung cancer (aNSCLC) patients receiving first-line immune checkpoint inhibitor (ICI) therapy to predict overall survival (OS). Our method outperforms baseline methods for predicting OS and identifies three reproducible subphenotypes associated with distinct baseline clinical characteristics and OS. Our results demonstrate that our method can provide insights in the heterogeneity of treatment response and potentially influence treatment selection. |
| format | Article |
| id | doaj-art-4640bfc9190d40a7bf3e8b3beeae8a4e |
| institution | OA Journals |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-4640bfc9190d40a7bf3e8b3beeae8a4e2025-08-20T01:51:38ZengNature PortfolioNature Communications2041-17232025-05-0116111410.1038/s41467-025-59092-8Identification of predictive subphenotypes for clinical outcomes using real world data and machine learningWeishen Pan0Deep Hathi1Zhenxing Xu2Qiannan Zhang3Ying Li4Fei Wang5Department of Population Health Sciences, Weill Cornell Medicine, Cornell UniversityRegeneron Pharmaceuticals, Inc.Department of Population Health Sciences, Weill Cornell Medicine, Cornell UniversityDepartment of Population Health Sciences, Weill Cornell Medicine, Cornell UniversityRegeneron Pharmaceuticals, Inc.Department of Population Health Sciences, Weill Cornell Medicine, Cornell UniversityAbstract Predicting treatment response is an important problem in real-world applications, where the heterogeneity of the treatment response remains a significant challenge in practice. Unsupervised machine learning methods have been proposed to address this challenge by clustering patients with similar electronic health record (EHR) data. However, they cannot guarantee coherent outcomes within the groups. Here, we propose Graph-Encoded Mixture Survival (GEMS) as a general machine learning framework to identify distinct predictive subphenotypes that guarantee coherent survival and baseline characteristics within each subphenotype. We apply our method to a real-world dataset of advanced non-small cell lung cancer (aNSCLC) patients receiving first-line immune checkpoint inhibitor (ICI) therapy to predict overall survival (OS). Our method outperforms baseline methods for predicting OS and identifies three reproducible subphenotypes associated with distinct baseline clinical characteristics and OS. Our results demonstrate that our method can provide insights in the heterogeneity of treatment response and potentially influence treatment selection.https://doi.org/10.1038/s41467-025-59092-8 |
| spellingShingle | Weishen Pan Deep Hathi Zhenxing Xu Qiannan Zhang Ying Li Fei Wang Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning Nature Communications |
| title | Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning |
| title_full | Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning |
| title_fullStr | Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning |
| title_full_unstemmed | Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning |
| title_short | Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning |
| title_sort | identification of predictive subphenotypes for clinical outcomes using real world data and machine learning |
| url | https://doi.org/10.1038/s41467-025-59092-8 |
| work_keys_str_mv | AT weishenpan identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning AT deephathi identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning AT zhenxingxu identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning AT qiannanzhang identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning AT yingli identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning AT feiwang identificationofpredictivesubphenotypesforclinicaloutcomesusingrealworlddataandmachinelearning |