S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure

Abstract Proteins play an essential role in various biological and engineering processes. Large protein language models (PLMs) present excellent potential to reshape protein research by accelerating the determination of protein functions and the design of proteins with the desired functions. The pre...

Full description

Saved in:
Bibliographic Details
Main Authors: Duolin Wang, Mahdi Pourmirzaei, Usman L. Abbas, Shuai Zeng, Negin Manshour, Farzaneh Esmaili, Biplab Poudel, Yuexu Jiang, Qing Shao, Jin Chen, Dong Xu
Format: Article
Language:English
Published: Wiley 2025-02-01
Series:Advanced Science
Subjects:
Online Access:https://doi.org/10.1002/advs.202404212
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832540878441283584
author Duolin Wang
Mahdi Pourmirzaei
Usman L. Abbas
Shuai Zeng
Negin Manshour
Farzaneh Esmaili
Biplab Poudel
Yuexu Jiang
Qing Shao
Jin Chen
Dong Xu
author_facet Duolin Wang
Mahdi Pourmirzaei
Usman L. Abbas
Shuai Zeng
Negin Manshour
Farzaneh Esmaili
Biplab Poudel
Yuexu Jiang
Qing Shao
Jin Chen
Dong Xu
author_sort Duolin Wang
collection DOAJ
description Abstract Proteins play an essential role in various biological and engineering processes. Large protein language models (PLMs) present excellent potential to reshape protein research by accelerating the determination of protein functions and the design of proteins with the desired functions. The prediction and design capacity of PLMs relies on the representation gained from the protein sequences. However, the lack of crucial 3D structure information in most PLMs restricts the prediction capacity of PLMs in various applications, especially those heavily dependent on 3D structures. To address this issue, S‐PLM is introduced as a 3D structure‐aware PLM that utilizes multi‐view contrastive learning to align the sequence and 3D structure of a protein in a coordinated latent space. S‐PLM applies Swin‐Transformer on AlphaFold‐predicted protein structures to embed the structural information and fuses it into sequence‐based embedding from ESM2. Additionally, a library of lightweight tuning tools is provided to adapt S‐PLM for diverse downstream protein prediction tasks. The results demonstrate S‐PLM's superior performance over sequence‐only PLMs on all protein clustering and classification tasks, achieving competitiveness comparable to state‐of‐the‐art methods requiring both sequence and structure inputs. S‐PLM and its lightweight tuning tools are available at https://github.com/duolinwang/S-PLM/.
format Article
id doaj-art-101e1602d4e744c98022e6f97c2a8ae2
institution Kabale University
issn 2198-3844
language English
publishDate 2025-02-01
publisher Wiley
record_format Article
series Advanced Science
spelling doaj-art-101e1602d4e744c98022e6f97c2a8ae22025-02-04T13:14:54ZengWileyAdvanced Science2198-38442025-02-01125n/an/a10.1002/advs.202404212S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and StructureDuolin Wang0Mahdi Pourmirzaei1Usman L. Abbas2Shuai Zeng3Negin Manshour4Farzaneh Esmaili5Biplab Poudel6Yuexu Jiang7Qing Shao8Jin Chen9Dong Xu10Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USAChemical & Materials Engineering University of Kentucky Lexington KY 40506 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USAChemical & Materials Engineering University of Kentucky Lexington KY 40506 USADepartment of Medicine and Department of Biomedical Informatics and Data Science University of Alabama at Birmingham Birmingham AL 35294 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USAAbstract Proteins play an essential role in various biological and engineering processes. Large protein language models (PLMs) present excellent potential to reshape protein research by accelerating the determination of protein functions and the design of proteins with the desired functions. The prediction and design capacity of PLMs relies on the representation gained from the protein sequences. However, the lack of crucial 3D structure information in most PLMs restricts the prediction capacity of PLMs in various applications, especially those heavily dependent on 3D structures. To address this issue, S‐PLM is introduced as a 3D structure‐aware PLM that utilizes multi‐view contrastive learning to align the sequence and 3D structure of a protein in a coordinated latent space. S‐PLM applies Swin‐Transformer on AlphaFold‐predicted protein structures to embed the structural information and fuses it into sequence‐based embedding from ESM2. Additionally, a library of lightweight tuning tools is provided to adapt S‐PLM for diverse downstream protein prediction tasks. The results demonstrate S‐PLM's superior performance over sequence‐only PLMs on all protein clustering and classification tasks, achieving competitiveness comparable to state‐of‐the‐art methods requiring both sequence and structure inputs. S‐PLM and its lightweight tuning tools are available at https://github.com/duolinwang/S-PLM/.https://doi.org/10.1002/advs.202404212contrastive learningdeep learningprotein function predictionprotein language modelprotein structure
spellingShingle Duolin Wang
Mahdi Pourmirzaei
Usman L. Abbas
Shuai Zeng
Negin Manshour
Farzaneh Esmaili
Biplab Poudel
Yuexu Jiang
Qing Shao
Jin Chen
Dong Xu
S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure
Advanced Science
contrastive learning
deep learning
protein function prediction
protein language model
protein structure
title S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure
title_full S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure
title_fullStr S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure
title_full_unstemmed S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure
title_short S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure
title_sort s plm structure aware protein language model via contrastive learning between sequence and structure
topic contrastive learning
deep learning
protein function prediction
protein language model
protein structure
url https://doi.org/10.1002/advs.202404212
work_keys_str_mv AT duolinwang splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT mahdipourmirzaei splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT usmanlabbas splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT shuaizeng splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT neginmanshour splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT farzanehesmaili splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT biplabpoudel splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT yuexujiang splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT qingshao splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT jinchen splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT dongxu splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure