Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery

The effectiveness of AI-driven drug discovery can be enhanced by pretraining on small molecules. However, the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to the limited vocabulary size and the non-sequential structure of molecules. To overc...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhaoxu Meng, Cheng Chen, Xuan Zhang, Wei Zhao, Xuefeng Cui
Format: Article
Language:English
Published: Tsinghua University Press 2024-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020003
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832568849837326336
author Zhaoxu Meng
Cheng Chen
Xuan Zhang
Wei Zhao
Xuefeng Cui
author_facet Zhaoxu Meng
Cheng Chen
Xuan Zhang
Wei Zhao
Xuefeng Cui
author_sort Zhaoxu Meng
collection DOAJ
description The effectiveness of AI-driven drug discovery can be enhanced by pretraining on small molecules. However, the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to the limited vocabulary size and the non-sequential structure of molecules. To overcome these challenges, we propose FragAdd, a strategy that involves adding a chemically implausible molecular fragment to the input molecule. This approach allows for the incorporation of rich local information and the generation of a high-quality graph representation, which is advantageous for tasks like virtual screening. Consequently, we have developed a virtual screening protocol that focuses on identifying estrogen receptor alpha binders on a nucleus receptor. Our results demonstrate a significant improvement in the binding capacity of the retrieved molecules. Additionally, we demonstrate that the FragAdd strategy can be combined with other self-supervised methods to further expedite the drug discovery process.
format Article
id doaj-art-75dbf0c867db4ba6bab24df45ec47c91
institution Kabale University
issn 2096-0654
language English
publishDate 2024-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-75dbf0c867db4ba6bab24df45ec47c912025-02-03T00:12:56ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-09-017356557610.26599/BDMA.2024.9020003Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug DiscoveryZhaoxu Meng0Cheng Chen1Xuan Zhang2Wei Zhao3Xuefeng Cui4School of Life Sciences, Shandong University, Qingdao 266237, ChinaSchool of Computer Science and Technology, Shandong University, Qingdao 266237, ChinaSchool of Computer Science and Technology, Shandong University, Qingdao 266237, ChinaState Key Laboratory of Microbiology Technology, Shandong University, Qingdao 266237, ChinaSchool of Computer Science and Technology, Shandong University, Qingdao 266237, ChinaThe effectiveness of AI-driven drug discovery can be enhanced by pretraining on small molecules. However, the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to the limited vocabulary size and the non-sequential structure of molecules. To overcome these challenges, we propose FragAdd, a strategy that involves adding a chemically implausible molecular fragment to the input molecule. This approach allows for the incorporation of rich local information and the generation of a high-quality graph representation, which is advantageous for tasks like virtual screening. Consequently, we have developed a virtual screening protocol that focuses on identifying estrogen receptor alpha binders on a nucleus receptor. Our results demonstrate a significant improvement in the binding capacity of the retrieved molecules. Additionally, we demonstrate that the FragAdd strategy can be combined with other self-supervised methods to further expedite the drug discovery process.https://www.sciopen.com/article/10.26599/BDMA.2024.9020003pretraininginformation retrievaldrug discoveryvirtual screeningmolecule property prediction
spellingShingle Zhaoxu Meng
Cheng Chen
Xuan Zhang
Wei Zhao
Xuefeng Cui
Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery
Big Data Mining and Analytics
pretraining
information retrieval
drug discovery
virtual screening
molecule property prediction
title Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery
title_full Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery
title_fullStr Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery
title_full_unstemmed Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery
title_short Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery
title_sort exploring fragment adding strategies to enhance molecule pretraining in ai driven drug discovery
topic pretraining
information retrieval
drug discovery
virtual screening
molecule property prediction
url https://www.sciopen.com/article/10.26599/BDMA.2024.9020003
work_keys_str_mv AT zhaoxumeng exploringfragmentaddingstrategiestoenhancemoleculepretraininginaidrivendrugdiscovery
AT chengchen exploringfragmentaddingstrategiestoenhancemoleculepretraininginaidrivendrugdiscovery
AT xuanzhang exploringfragmentaddingstrategiestoenhancemoleculepretraininginaidrivendrugdiscovery
AT weizhao exploringfragmentaddingstrategiestoenhancemoleculepretraininginaidrivendrugdiscovery
AT xuefengcui exploringfragmentaddingstrategiestoenhancemoleculepretraininginaidrivendrugdiscovery