Multimodal neural network for enhanced protein stability prediction by integration of contact scores and spatial maps
The prediction of protein stability changes upon mutation remains a significant challenge in bioinformatics, with implications for understanding disease mechanisms and drug design. Despite progress through machine learning and neural network models, there is a need for more accurate predictive model...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-12-01
|
| Series: | Results in Engineering |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S259012302401692X |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The prediction of protein stability changes upon mutation remains a significant challenge in bioinformatics, with implications for understanding disease mechanisms and drug design. Despite progress through machine learning and neural network models, there is a need for more accurate predictive models. We explored Random Forest, single input 2D CNN, multi-input 2D CNN, and multimodal CNN approaches to train a model for predicting favorable and unfavorable mutations for protein stability. Our findings revealed that the multi-input 2D CNN trained on contact maps outperformed other approaches employed in this study. An accuracy of 0.679 was achieved, with a negative prediction of 0.74 and a specificity of 0.81, demonstrating promising advancements in predicting protein stability and demonstrating the utility of integrating diverse data representations. Nonetheless, further refinement is necessary to address overfitting and improve its predictive accuracy for unseen data. This work contributes to the broader effort to enhance computational models for protein stability prediction, underscoring the need for balanced complexity in model design. The dataset was modeled via AlphaFold, followed by refinement with amber relaxation to ensure that the resultant 3D models of the variants were both structurally accurate and energetically feasible. An advanced Python-based AI/ML pipeline was developed to train the models, evaluate the datasets, and predict protein stability using scikit-learn, TensorFlow, and Keras deep learning libraries. |
|---|---|
| ISSN: | 2590-1230 |