Multimodal neural network for enhanced protein stability prediction by integration of contact scores and spatial maps

The prediction of protein stability changes upon mutation remains a significant challenge in bioinformatics, with implications for understanding disease mechanisms and drug design. Despite progress through machine learning and neural network models, there is a need for more accurate predictive model...

Full description

Saved in:
Bibliographic Details
Main Authors: G Gladstone Sigamani, P.M. Durai Raj Vincent
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Results in Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S259012302401692X
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The prediction of protein stability changes upon mutation remains a significant challenge in bioinformatics, with implications for understanding disease mechanisms and drug design. Despite progress through machine learning and neural network models, there is a need for more accurate predictive models. We explored Random Forest, single input 2D CNN, multi-input 2D CNN, and multimodal CNN approaches to train a model for predicting favorable and unfavorable mutations for protein stability. Our findings revealed that the multi-input 2D CNN trained on contact maps outperformed other approaches employed in this study. An accuracy of 0.679 was achieved, with a negative prediction of 0.74 and a specificity of 0.81, demonstrating promising advancements in predicting protein stability and demonstrating the utility of integrating diverse data representations. Nonetheless, further refinement is necessary to address overfitting and improve its predictive accuracy for unseen data. This work contributes to the broader effort to enhance computational models for protein stability prediction, underscoring the need for balanced complexity in model design. The dataset was modeled via AlphaFold, followed by refinement with amber relaxation to ensure that the resultant 3D models of the variants were both structurally accurate and energetically feasible. An advanced Python-based AI/ML pipeline was developed to train the models, evaluate the datasets, and predict protein stability using scikit-learn, TensorFlow, and Keras deep learning libraries.
ISSN:2590-1230