A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability

Abstract The major challenges in drug development stem from frequent structure-activity cliffs and unknown drug properties, which are expensive and time-consuming to estimate, contributing to a high rate of failures and substantial unavoidable costs in the clinical phases. Herein, we propose the sel...

Full description

Saved in:
Bibliographic Details
Main Authors: Jianbo Qiao, Junru Jin, Ding Wang, Saisai Teng, Junyu Zhang, Xuetong Yang, Yuhang Liu, Yu Wang, Lizhen Cui, Quan Zou, Ran Su, Leyi Wei
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-59634-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850154618577747968
author Jianbo Qiao
Junru Jin
Ding Wang
Saisai Teng
Junyu Zhang
Xuetong Yang
Yuhang Liu
Yu Wang
Lizhen Cui
Quan Zou
Ran Su
Leyi Wei
author_facet Jianbo Qiao
Junru Jin
Ding Wang
Saisai Teng
Junyu Zhang
Xuetong Yang
Yuhang Liu
Yu Wang
Lizhen Cui
Quan Zou
Ran Su
Leyi Wei
author_sort Jianbo Qiao
collection DOAJ
description Abstract The major challenges in drug development stem from frequent structure-activity cliffs and unknown drug properties, which are expensive and time-consuming to estimate, contributing to a high rate of failures and substantial unavoidable costs in the clinical phases. Herein, we propose the self-conformation-aware graph transformer (SCAGE), an innovative deep learning architecture pretrained with approximately 5 million drug-like compounds for molecular property prediction. Notably, we develop a multitask pretraining framework, which incorporates four supervised and unsupervised tasks: molecular fingerprint prediction, functional group prediction using chemical prior information, 2D atomic distance prediction, and 3D bond angle prediction, covering aspects from molecular structures to functions. It enables learning comprehensive conformation-aware prior knowledge, thereby enhancing its generalization across various molecular property tasks. Moreover, we design a data-driven multiscale conformational learning strategy that effectively guides the model in understanding and representing atomic relationships at the molecular conformational scale. SCAGE achieves significant performance improvements across 9 molecular properties and 30 structure-activity cliff benchmarks. Case studies demonstrate that SCAGE accurately captures crucial functional groups at the atomic level, which are closely associated with molecular activity, providing valuable insights into quantitative structure-activity relationships.
format Article
id doaj-art-b46a2b4e268d41e89f54a978cd6ee4f3
institution OA Journals
issn 2041-1723
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-b46a2b4e268d41e89f54a978cd6ee4f32025-08-20T02:25:16ZengNature PortfolioNature Communications2041-17232025-05-0116111610.1038/s41467-025-59634-0A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretabilityJianbo Qiao0Junru Jin1Ding Wang2Saisai Teng3Junyu Zhang4Xuetong Yang5Yuhang Liu6Yu Wang7Lizhen Cui8Quan Zou9Ran Su10Leyi Wei11School of Software, Shandong UniversitySchool of Software, Shandong UniversitySchool of Software, Shandong UniversitySchool of Software, Shandong UniversitySchool of Software, Shandong UniversitySchool of Software, Shandong UniversityFaculty of Applied Sciences, Macao Polytechnic UniversitySchool of Software, Shandong UniversitySchool of Software, Shandong UniversityInstitute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of ChinaCollege of Intelligence and Computing, Tianjin UniversityFaculty of Applied Sciences, Macao Polytechnic UniversityAbstract The major challenges in drug development stem from frequent structure-activity cliffs and unknown drug properties, which are expensive and time-consuming to estimate, contributing to a high rate of failures and substantial unavoidable costs in the clinical phases. Herein, we propose the self-conformation-aware graph transformer (SCAGE), an innovative deep learning architecture pretrained with approximately 5 million drug-like compounds for molecular property prediction. Notably, we develop a multitask pretraining framework, which incorporates four supervised and unsupervised tasks: molecular fingerprint prediction, functional group prediction using chemical prior information, 2D atomic distance prediction, and 3D bond angle prediction, covering aspects from molecular structures to functions. It enables learning comprehensive conformation-aware prior knowledge, thereby enhancing its generalization across various molecular property tasks. Moreover, we design a data-driven multiscale conformational learning strategy that effectively guides the model in understanding and representing atomic relationships at the molecular conformational scale. SCAGE achieves significant performance improvements across 9 molecular properties and 30 structure-activity cliff benchmarks. Case studies demonstrate that SCAGE accurately captures crucial functional groups at the atomic level, which are closely associated with molecular activity, providing valuable insights into quantitative structure-activity relationships.https://doi.org/10.1038/s41467-025-59634-0
spellingShingle Jianbo Qiao
Junru Jin
Ding Wang
Saisai Teng
Junyu Zhang
Xuetong Yang
Yuhang Liu
Yu Wang
Lizhen Cui
Quan Zou
Ran Su
Leyi Wei
A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability
Nature Communications
title A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability
title_full A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability
title_fullStr A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability
title_full_unstemmed A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability
title_short A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability
title_sort self conformation aware pre training framework for molecular property prediction with substructure interpretability
url https://doi.org/10.1038/s41467-025-59634-0
work_keys_str_mv AT jianboqiao aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT junrujin aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT dingwang aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT saisaiteng aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT junyuzhang aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT xuetongyang aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT yuhangliu aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT yuwang aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT lizhencui aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT quanzou aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT ransu aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT leyiwei aselfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT jianboqiao selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT junrujin selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT dingwang selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT saisaiteng selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT junyuzhang selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT xuetongyang selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT yuhangliu selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT yuwang selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT lizhencui selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT quanzou selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT ransu selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability
AT leyiwei selfconformationawarepretrainingframeworkformolecularpropertypredictionwithsubstructureinterpretability