Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations

Abstract Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and de...

Full description

Saved in:
Bibliographic Details
Main Authors: Magdalyn E. Elkin, Xingquan Zhu
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Communications Biology
Online Access:https://doi.org/10.1038/s42003-024-07262-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585465735151616
author Magdalyn E. Elkin
Xingquan Zhu
author_facet Magdalyn E. Elkin
Xingquan Zhu
author_sort Magdalyn E. Elkin
collection DOAJ
description Abstract Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and deciphering sequences. In this paper, we propose a Deep Novel Mutation Search (DNMS) method, using deep neural networks, to model protein sequence for mutation prediction. We use SARS-CoV-2 spike protein as the target and use a protein language model to predict novel mutations. Different from existing research which is often limited to mutating the reference sequence for prediction, we propose a parent-child mutation prediction paradigm where a parent sequence is modeled for mutation prediction. Because mutations introduce changing context to the underlying sequence, DNMS models three aspects of the protein sequences: semantic changes, grammatical changes, and attention changes, each modeling protein sequence aspects from shifting of semantics, grammar coherence, and amino-acid interactions in latent space. A ranking approach is proposed to combine all three aspects to capture mutations demonstrating evolving traits, in accordance with real-world SARS-CoV-2 spike protein sequence evolution. DNMS can be adopted for an early warning variant detection system, creating public health awareness of future SARS-CoV-2 mutations.
format Article
id doaj-art-f46a571e66c5455a91aabb93b83c5277
institution Kabale University
issn 2399-3642
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Communications Biology
spelling doaj-art-f46a571e66c5455a91aabb93b83c52772025-01-26T12:48:23ZengNature PortfolioCommunications Biology2399-36422025-01-018111610.1038/s42003-024-07262-7Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutationsMagdalyn E. Elkin0Xingquan Zhu1Dept. Electrical Engineering and Computer Science, Florida Atlantic UniversityDept. Electrical Engineering and Computer Science, Florida Atlantic UniversityAbstract Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and deciphering sequences. In this paper, we propose a Deep Novel Mutation Search (DNMS) method, using deep neural networks, to model protein sequence for mutation prediction. We use SARS-CoV-2 spike protein as the target and use a protein language model to predict novel mutations. Different from existing research which is often limited to mutating the reference sequence for prediction, we propose a parent-child mutation prediction paradigm where a parent sequence is modeled for mutation prediction. Because mutations introduce changing context to the underlying sequence, DNMS models three aspects of the protein sequences: semantic changes, grammatical changes, and attention changes, each modeling protein sequence aspects from shifting of semantics, grammar coherence, and amino-acid interactions in latent space. A ranking approach is proposed to combine all three aspects to capture mutations demonstrating evolving traits, in accordance with real-world SARS-CoV-2 spike protein sequence evolution. DNMS can be adopted for an early warning variant detection system, creating public health awareness of future SARS-CoV-2 mutations.https://doi.org/10.1038/s42003-024-07262-7
spellingShingle Magdalyn E. Elkin
Xingquan Zhu
Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations
Communications Biology
title Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations
title_full Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations
title_fullStr Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations
title_full_unstemmed Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations
title_short Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations
title_sort paying attention to the sars cov 2 dialect a deep neural network approach to predicting novel protein mutations
url https://doi.org/10.1038/s42003-024-07262-7
work_keys_str_mv AT magdalyneelkin payingattentiontothesarscov2dialectadeepneuralnetworkapproachtopredictingnovelproteinmutations
AT xingquanzhu payingattentiontothesarscov2dialectadeepneuralnetworkapproachtopredictingnovelproteinmutations