S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure
Abstract Proteins play an essential role in various biological and engineering processes. Large protein language models (PLMs) present excellent potential to reshape protein research by accelerating the determination of protein functions and the design of proteins with the desired functions. The pre...
Saved in:
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2025-02-01
|
Series: | Advanced Science |
Subjects: | |
Online Access: | https://doi.org/10.1002/advs.202404212 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832540878441283584 |
---|---|
author | Duolin Wang Mahdi Pourmirzaei Usman L. Abbas Shuai Zeng Negin Manshour Farzaneh Esmaili Biplab Poudel Yuexu Jiang Qing Shao Jin Chen Dong Xu |
author_facet | Duolin Wang Mahdi Pourmirzaei Usman L. Abbas Shuai Zeng Negin Manshour Farzaneh Esmaili Biplab Poudel Yuexu Jiang Qing Shao Jin Chen Dong Xu |
author_sort | Duolin Wang |
collection | DOAJ |
description | Abstract Proteins play an essential role in various biological and engineering processes. Large protein language models (PLMs) present excellent potential to reshape protein research by accelerating the determination of protein functions and the design of proteins with the desired functions. The prediction and design capacity of PLMs relies on the representation gained from the protein sequences. However, the lack of crucial 3D structure information in most PLMs restricts the prediction capacity of PLMs in various applications, especially those heavily dependent on 3D structures. To address this issue, S‐PLM is introduced as a 3D structure‐aware PLM that utilizes multi‐view contrastive learning to align the sequence and 3D structure of a protein in a coordinated latent space. S‐PLM applies Swin‐Transformer on AlphaFold‐predicted protein structures to embed the structural information and fuses it into sequence‐based embedding from ESM2. Additionally, a library of lightweight tuning tools is provided to adapt S‐PLM for diverse downstream protein prediction tasks. The results demonstrate S‐PLM's superior performance over sequence‐only PLMs on all protein clustering and classification tasks, achieving competitiveness comparable to state‐of‐the‐art methods requiring both sequence and structure inputs. S‐PLM and its lightweight tuning tools are available at https://github.com/duolinwang/S-PLM/. |
format | Article |
id | doaj-art-101e1602d4e744c98022e6f97c2a8ae2 |
institution | Kabale University |
issn | 2198-3844 |
language | English |
publishDate | 2025-02-01 |
publisher | Wiley |
record_format | Article |
series | Advanced Science |
spelling | doaj-art-101e1602d4e744c98022e6f97c2a8ae22025-02-04T13:14:54ZengWileyAdvanced Science2198-38442025-02-01125n/an/a10.1002/advs.202404212S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and StructureDuolin Wang0Mahdi Pourmirzaei1Usman L. Abbas2Shuai Zeng3Negin Manshour4Farzaneh Esmaili5Biplab Poudel6Yuexu Jiang7Qing Shao8Jin Chen9Dong Xu10Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USAChemical & Materials Engineering University of Kentucky Lexington KY 40506 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USAChemical & Materials Engineering University of Kentucky Lexington KY 40506 USADepartment of Medicine and Department of Biomedical Informatics and Data Science University of Alabama at Birmingham Birmingham AL 35294 USADepartment of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center University of Missouri Columbia MO 65211 USAAbstract Proteins play an essential role in various biological and engineering processes. Large protein language models (PLMs) present excellent potential to reshape protein research by accelerating the determination of protein functions and the design of proteins with the desired functions. The prediction and design capacity of PLMs relies on the representation gained from the protein sequences. However, the lack of crucial 3D structure information in most PLMs restricts the prediction capacity of PLMs in various applications, especially those heavily dependent on 3D structures. To address this issue, S‐PLM is introduced as a 3D structure‐aware PLM that utilizes multi‐view contrastive learning to align the sequence and 3D structure of a protein in a coordinated latent space. S‐PLM applies Swin‐Transformer on AlphaFold‐predicted protein structures to embed the structural information and fuses it into sequence‐based embedding from ESM2. Additionally, a library of lightweight tuning tools is provided to adapt S‐PLM for diverse downstream protein prediction tasks. The results demonstrate S‐PLM's superior performance over sequence‐only PLMs on all protein clustering and classification tasks, achieving competitiveness comparable to state‐of‐the‐art methods requiring both sequence and structure inputs. S‐PLM and its lightweight tuning tools are available at https://github.com/duolinwang/S-PLM/.https://doi.org/10.1002/advs.202404212contrastive learningdeep learningprotein function predictionprotein language modelprotein structure |
spellingShingle | Duolin Wang Mahdi Pourmirzaei Usman L. Abbas Shuai Zeng Negin Manshour Farzaneh Esmaili Biplab Poudel Yuexu Jiang Qing Shao Jin Chen Dong Xu S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure Advanced Science contrastive learning deep learning protein function prediction protein language model protein structure |
title | S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure |
title_full | S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure |
title_fullStr | S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure |
title_full_unstemmed | S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure |
title_short | S‐PLM: Structure‐Aware Protein Language Model via Contrastive Learning Between Sequence and Structure |
title_sort | s plm structure aware protein language model via contrastive learning between sequence and structure |
topic | contrastive learning deep learning protein function prediction protein language model protein structure |
url | https://doi.org/10.1002/advs.202404212 |
work_keys_str_mv | AT duolinwang splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT mahdipourmirzaei splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT usmanlabbas splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT shuaizeng splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT neginmanshour splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT farzanehesmaili splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT biplabpoudel splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT yuexujiang splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT qingshao splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT jinchen splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT dongxu splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure |