Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing

Abstract Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR t...

Full description

Saved in:
Bibliographic Details
Main Authors: Atsushi Yoshimori, Jürgen Bajorath
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-00951-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594459996454912
author Atsushi Yoshimori
Jürgen Bajorath
author_facet Atsushi Yoshimori
Jürgen Bajorath
author_sort Atsushi Yoshimori
collection DOAJ
description Abstract Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing. The methodology comprehensively accounts for substituent similarity, identifies non-classical bioisosteres, captures substituent-property relationships, and generates accurate AS alignments. Context-dependent similarity assessment is conceptually novel in computational medicinal chemistry and should also be of interest for other applications. Scientific contribution A method is reported to systematically search for and align analogue series with SAR transfer potential. Central to the approach is the assessment of context-dependent similarity for substituents, a new concept in cheminformatics, which is based upon vector embeddings and word pair relationships adapted from natural language processing.
format Article
id doaj-art-164fe480377d4bec853914b3386b4936
institution Kabale University
issn 1758-2946
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-164fe480377d4bec853914b3386b49362025-01-19T12:37:01ZengBMCJournal of Cheminformatics1758-29462025-01-0117111410.1186/s13321-025-00951-3Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processingAtsushi Yoshimori0Jürgen Bajorath1Institute for Theoretical Medicine, Inc.Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of BonnAbstract Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing. The methodology comprehensively accounts for substituent similarity, identifies non-classical bioisosteres, captures substituent-property relationships, and generates accurate AS alignments. Context-dependent similarity assessment is conceptually novel in computational medicinal chemistry and should also be of interest for other applications. Scientific contribution A method is reported to systematically search for and align analogue series with SAR transfer potential. Central to the approach is the assessment of context-dependent similarity for substituents, a new concept in cheminformatics, which is based upon vector embeddings and word pair relationships adapted from natural language processing.https://doi.org/10.1186/s13321-025-00951-3Active analogue seriesPotency progressionSeries alignmentSAR transferFragment pair relationshipsContext-dependent similarity
spellingShingle Atsushi Yoshimori
Jürgen Bajorath
Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing
Journal of Cheminformatics
Active analogue series
Potency progression
Series alignment
SAR transfer
Fragment pair relationships
Context-dependent similarity
title Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing
title_full Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing
title_fullStr Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing
title_full_unstemmed Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing
title_short Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing
title_sort context dependent similarity analysis of analogue series for structure activity relationship transfer based on a concept from natural language processing
topic Active analogue series
Potency progression
Series alignment
SAR transfer
Fragment pair relationships
Context-dependent similarity
url https://doi.org/10.1186/s13321-025-00951-3
work_keys_str_mv AT atsushiyoshimori contextdependentsimilarityanalysisofanalogueseriesforstructureactivityrelationshiptransferbasedonaconceptfromnaturallanguageprocessing
AT jurgenbajorath contextdependentsimilarityanalysisofanalogueseriesforstructureactivityrelationshiptransferbasedonaconceptfromnaturallanguageprocessing