Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations
Abstract Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and de...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Communications Biology |
Online Access: | https://doi.org/10.1038/s42003-024-07262-7 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585465735151616 |
---|---|
author | Magdalyn E. Elkin Xingquan Zhu |
author_facet | Magdalyn E. Elkin Xingquan Zhu |
author_sort | Magdalyn E. Elkin |
collection | DOAJ |
description | Abstract Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and deciphering sequences. In this paper, we propose a Deep Novel Mutation Search (DNMS) method, using deep neural networks, to model protein sequence for mutation prediction. We use SARS-CoV-2 spike protein as the target and use a protein language model to predict novel mutations. Different from existing research which is often limited to mutating the reference sequence for prediction, we propose a parent-child mutation prediction paradigm where a parent sequence is modeled for mutation prediction. Because mutations introduce changing context to the underlying sequence, DNMS models three aspects of the protein sequences: semantic changes, grammatical changes, and attention changes, each modeling protein sequence aspects from shifting of semantics, grammar coherence, and amino-acid interactions in latent space. A ranking approach is proposed to combine all three aspects to capture mutations demonstrating evolving traits, in accordance with real-world SARS-CoV-2 spike protein sequence evolution. DNMS can be adopted for an early warning variant detection system, creating public health awareness of future SARS-CoV-2 mutations. |
format | Article |
id | doaj-art-f46a571e66c5455a91aabb93b83c5277 |
institution | Kabale University |
issn | 2399-3642 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Communications Biology |
spelling | doaj-art-f46a571e66c5455a91aabb93b83c52772025-01-26T12:48:23ZengNature PortfolioCommunications Biology2399-36422025-01-018111610.1038/s42003-024-07262-7Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutationsMagdalyn E. Elkin0Xingquan Zhu1Dept. Electrical Engineering and Computer Science, Florida Atlantic UniversityDept. Electrical Engineering and Computer Science, Florida Atlantic UniversityAbstract Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and deciphering sequences. In this paper, we propose a Deep Novel Mutation Search (DNMS) method, using deep neural networks, to model protein sequence for mutation prediction. We use SARS-CoV-2 spike protein as the target and use a protein language model to predict novel mutations. Different from existing research which is often limited to mutating the reference sequence for prediction, we propose a parent-child mutation prediction paradigm where a parent sequence is modeled for mutation prediction. Because mutations introduce changing context to the underlying sequence, DNMS models three aspects of the protein sequences: semantic changes, grammatical changes, and attention changes, each modeling protein sequence aspects from shifting of semantics, grammar coherence, and amino-acid interactions in latent space. A ranking approach is proposed to combine all three aspects to capture mutations demonstrating evolving traits, in accordance with real-world SARS-CoV-2 spike protein sequence evolution. DNMS can be adopted for an early warning variant detection system, creating public health awareness of future SARS-CoV-2 mutations.https://doi.org/10.1038/s42003-024-07262-7 |
spellingShingle | Magdalyn E. Elkin Xingquan Zhu Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations Communications Biology |
title | Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations |
title_full | Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations |
title_fullStr | Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations |
title_full_unstemmed | Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations |
title_short | Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations |
title_sort | paying attention to the sars cov 2 dialect a deep neural network approach to predicting novel protein mutations |
url | https://doi.org/10.1038/s42003-024-07262-7 |
work_keys_str_mv | AT magdalyneelkin payingattentiontothesarscov2dialectadeepneuralnetworkapproachtopredictingnovelproteinmutations AT xingquanzhu payingattentiontothesarscov2dialectadeepneuralnetworkapproachtopredictingnovelproteinmutations |