PMSFF: Improved Protein Binding Residues Prediction through Multi-Scale Sequence-Based Feature Fusion Strategy

Proteins perform different biological functions through binding with various molecules which are mediated by a few key residues and accurate prediction of such protein binding residues (PBRs) is crucial for understanding cellular processes and for designing new drugs. Many computational prediction a...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuguang Li, Xiaofei Nan, Shoutao Zhang, Qinglei Zhou, Shuai Lu, Zhen Tian
Format: Article
Language:English
Published: MDPI AG 2024-09-01
Series:Biomolecules
Subjects:
Online Access:https://www.mdpi.com/2218-273X/14/10/1220
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Proteins perform different biological functions through binding with various molecules which are mediated by a few key residues and accurate prediction of such protein binding residues (PBRs) is crucial for understanding cellular processes and for designing new drugs. Many computational prediction approaches have been proposed to identify PBRs with sequence-based features. However, these approaches face two main challenges: (1) these methods only concatenate residue feature vectors with a simple sliding window strategy, and (2) it is challenging to find a uniform sliding window size suitable for learning embeddings across different types of PBRs. In this study, we propose one novel framework that could apply multiple types of PBRs <b>P</b>rediciton task through <b>M</b>ulti-scale <b>S</b>equence-based <b>F</b>eature <b>F</b>usion (PMSFF) strategy. Firstly, PMSFF employs a pre-trained language model named ProtT5, to encode amino acid residues in protein sequences. Then, it generates multi-scale residue embeddings by applying multi-size windows to capture effective neighboring residues and multi-size kernels to learn information across different scales. Additionally, the proposed model treats protein sequences as sentences, employing a bidirectional GRU to learn global context. We also collect benchmark datasets encompassing various PBRs types and evaluate our PMSFF approach to these datasets. Compared with state-of-the-art methods, PMSFF demonstrates superior performance on most PBRs prediction tasks.
ISSN:2218-273X