Ensemble learning-based predictor for driver synonymous mutation with sequence representation.
Synonymous mutations, once considered neutral, are now understood to have significant implications for a variety of diseases, particularly cancer. It is indispensable to identify these driver synonymous mutations in human cancers, yet current methods are constrained by data limitations. In this stud...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2025-01-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1012744 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832540329659596800 |
---|---|
author | Chuanmei Bi Yong Shi Junfeng Xia Zhen Liang Zhiqiang Wu Kai Xu Na Cheng |
author_facet | Chuanmei Bi Yong Shi Junfeng Xia Zhen Liang Zhiqiang Wu Kai Xu Na Cheng |
author_sort | Chuanmei Bi |
collection | DOAJ |
description | Synonymous mutations, once considered neutral, are now understood to have significant implications for a variety of diseases, particularly cancer. It is indispensable to identify these driver synonymous mutations in human cancers, yet current methods are constrained by data limitations. In this study, we initially investigate the impact of sequence-based features, including DNA shape, physicochemical properties and one-hot encoding of nucleotides, and deep learning-derived features from pre-trained chemical molecule language models based on BERT. Subsequently, we propose EPEL, an effect predictor for synonymous mutations employing ensemble learning. EPEL combines five tree-based models and optimizes feature selection to enhance predictive accuracy. Notably, the incorporation of DNA shape features and deep learning-derived features from chemical molecule represents a pioneering effect in assessing the impact of synonymous mutations in cancer. Compared to existing state-of-the-art methods, EPEL demonstrates superior performance on the independent test dataset. Furthermore, our analysis reveals a significant correlation between effect scores and patient outcomes across various cancer types. Interestingly, while deep learning methods have shown promise in other fields, their DNA sequence representations do not significantly enhance the identification of driver synonymous mutations in this study. Overall, we anticipate that EPEL will facilitate researchers to more precisely target driver synonymous mutations. EPEL is designed with flexibility, allowing users to retrain the prediction model and generate effect scores for synonymous mutations in human cancers. A user-friendly web server for EPEL is available at http://ahmu.EPEL.bio/. |
format | Article |
id | doaj-art-7b0498ac05c64383bcb82173a55b3069 |
institution | Kabale University |
issn | 1553-734X 1553-7358 |
language | English |
publishDate | 2025-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Computational Biology |
spelling | doaj-art-7b0498ac05c64383bcb82173a55b30692025-02-05T05:30:39ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-01-01211e101274410.1371/journal.pcbi.1012744Ensemble learning-based predictor for driver synonymous mutation with sequence representation.Chuanmei BiYong ShiJunfeng XiaZhen LiangZhiqiang WuKai XuNa ChengSynonymous mutations, once considered neutral, are now understood to have significant implications for a variety of diseases, particularly cancer. It is indispensable to identify these driver synonymous mutations in human cancers, yet current methods are constrained by data limitations. In this study, we initially investigate the impact of sequence-based features, including DNA shape, physicochemical properties and one-hot encoding of nucleotides, and deep learning-derived features from pre-trained chemical molecule language models based on BERT. Subsequently, we propose EPEL, an effect predictor for synonymous mutations employing ensemble learning. EPEL combines five tree-based models and optimizes feature selection to enhance predictive accuracy. Notably, the incorporation of DNA shape features and deep learning-derived features from chemical molecule represents a pioneering effect in assessing the impact of synonymous mutations in cancer. Compared to existing state-of-the-art methods, EPEL demonstrates superior performance on the independent test dataset. Furthermore, our analysis reveals a significant correlation between effect scores and patient outcomes across various cancer types. Interestingly, while deep learning methods have shown promise in other fields, their DNA sequence representations do not significantly enhance the identification of driver synonymous mutations in this study. Overall, we anticipate that EPEL will facilitate researchers to more precisely target driver synonymous mutations. EPEL is designed with flexibility, allowing users to retrain the prediction model and generate effect scores for synonymous mutations in human cancers. A user-friendly web server for EPEL is available at http://ahmu.EPEL.bio/.https://doi.org/10.1371/journal.pcbi.1012744 |
spellingShingle | Chuanmei Bi Yong Shi Junfeng Xia Zhen Liang Zhiqiang Wu Kai Xu Na Cheng Ensemble learning-based predictor for driver synonymous mutation with sequence representation. PLoS Computational Biology |
title | Ensemble learning-based predictor for driver synonymous mutation with sequence representation. |
title_full | Ensemble learning-based predictor for driver synonymous mutation with sequence representation. |
title_fullStr | Ensemble learning-based predictor for driver synonymous mutation with sequence representation. |
title_full_unstemmed | Ensemble learning-based predictor for driver synonymous mutation with sequence representation. |
title_short | Ensemble learning-based predictor for driver synonymous mutation with sequence representation. |
title_sort | ensemble learning based predictor for driver synonymous mutation with sequence representation |
url | https://doi.org/10.1371/journal.pcbi.1012744 |
work_keys_str_mv | AT chuanmeibi ensemblelearningbasedpredictorfordriversynonymousmutationwithsequencerepresentation AT yongshi ensemblelearningbasedpredictorfordriversynonymousmutationwithsequencerepresentation AT junfengxia ensemblelearningbasedpredictorfordriversynonymousmutationwithsequencerepresentation AT zhenliang ensemblelearningbasedpredictorfordriversynonymousmutationwithsequencerepresentation AT zhiqiangwu ensemblelearningbasedpredictorfordriversynonymousmutationwithsequencerepresentation AT kaixu ensemblelearningbasedpredictorfordriversynonymousmutationwithsequencerepresentation AT nacheng ensemblelearningbasedpredictorfordriversynonymousmutationwithsequencerepresentation |