Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach

The range of chemical databases available has dramatically increased in recent years, but the reliability and quality of their data are often negatively affected by human-error fidelity. The size of chemical databases can make manual data curation/checking of such sets time consuming; thus, automate...

Full description

Saved in:
Bibliographic Details
Main Authors: Eduardo Aguilar-Bejarano, Viraj Deorukhkar, Simon Woodward
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/30/2/355
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587842407104512
author Eduardo Aguilar-Bejarano
Viraj Deorukhkar
Simon Woodward
author_facet Eduardo Aguilar-Bejarano
Viraj Deorukhkar
Simon Woodward
author_sort Eduardo Aguilar-Bejarano
collection DOAJ
description The range of chemical databases available has dramatically increased in recent years, but the reliability and quality of their data are often negatively affected by human-error fidelity. The size of chemical databases can make manual data curation/checking of such sets time consuming; thus, automated tools to help this process are highly desirable. Herein, we propose the use of Graph Neural Networks (GNNs) to identifying potential stereochemical misassignments in the primary asymmetric catalysis literature. Our method relies on the use of an ensemble of GNN models to predict the expected stereoselectivity of exemplars for a particular asymmetric reaction. When the majority of these models do not correlate to the reported outcome, the point is labeled as a possible stereochemical misassignment. Such identified cases are few in number and more easily investigated for their cause. We demonstrate the use of this approach to spot potential literature stereochemical misassignments in the ketone products resulting from catalytic asymmetric 1,4-addition of organoboron nucleophiles to Michael acceptors in two different databases, each one using a different family of chiral ligands (bisphosphine and diene ligands). Our results demonstrate that this methodology is useful for curation of medium-sized databases, speeding this process significantly compared to complete manual curation/checking. In the datasets investigated, human expert checking was reduced to 2.2% and 3.5% of the total data exemplars.
format Article
id doaj-art-40538181c0fb4fc08d837fcdc5863911
institution Kabale University
issn 1420-3049
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj-art-40538181c0fb4fc08d837fcdc58639112025-01-24T13:43:44ZengMDPI AGMolecules1420-30492025-01-0130235510.3390/molecules30020355Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network ApproachEduardo Aguilar-Bejarano0Viraj Deorukhkar1Simon Woodward2GSK Carbon Neutral Laboratories for Sustainable Chemistry, Jubilee Campus, University of Nottingham, Triumph Road, Nottingham NG7 2TU, UKYusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UKGSK Carbon Neutral Laboratories for Sustainable Chemistry, Jubilee Campus, University of Nottingham, Triumph Road, Nottingham NG7 2TU, UKThe range of chemical databases available has dramatically increased in recent years, but the reliability and quality of their data are often negatively affected by human-error fidelity. The size of chemical databases can make manual data curation/checking of such sets time consuming; thus, automated tools to help this process are highly desirable. Herein, we propose the use of Graph Neural Networks (GNNs) to identifying potential stereochemical misassignments in the primary asymmetric catalysis literature. Our method relies on the use of an ensemble of GNN models to predict the expected stereoselectivity of exemplars for a particular asymmetric reaction. When the majority of these models do not correlate to the reported outcome, the point is labeled as a possible stereochemical misassignment. Such identified cases are few in number and more easily investigated for their cause. We demonstrate the use of this approach to spot potential literature stereochemical misassignments in the ketone products resulting from catalytic asymmetric 1,4-addition of organoboron nucleophiles to Michael acceptors in two different databases, each one using a different family of chiral ligands (bisphosphine and diene ligands). Our results demonstrate that this methodology is useful for curation of medium-sized databases, speeding this process significantly compared to complete manual curation/checking. In the datasets investigated, human expert checking was reduced to 2.2% and 3.5% of the total data exemplars.https://www.mdpi.com/1420-3049/30/2/355graph neural networksstereochemical misassignmentdatabase curation
spellingShingle Eduardo Aguilar-Bejarano
Viraj Deorukhkar
Simon Woodward
Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach
Molecules
graph neural networks
stereochemical misassignment
database curation
title Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach
title_full Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach
title_fullStr Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach
title_full_unstemmed Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach
title_short Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach
title_sort data checking of asymmetric catalysis literature using a graph neural network approach
topic graph neural networks
stereochemical misassignment
database curation
url https://www.mdpi.com/1420-3049/30/2/355
work_keys_str_mv AT eduardoaguilarbejarano datacheckingofasymmetriccatalysisliteratureusingagraphneuralnetworkapproach
AT virajdeorukhkar datacheckingofasymmetriccatalysisliteratureusingagraphneuralnetworkapproach
AT simonwoodward datacheckingofasymmetriccatalysisliteratureusingagraphneuralnetworkapproach