Development and experimental validation of a machine learning model for the prediction of new antimalarials

Abstract A large set of antimalarial molecules (N ~ 15k) was employed from ChEMBL to build a robust random forest (RF) model for the prediction of antiplasmodial activity. Rather than depending on high throughput screening (HTS) data, molecules tested at multiple doses against blood stages of Plasmo...

Full description

Saved in:
Bibliographic Details
Main Authors: Mukul Kore, Dimple Acharya, Lakshya Sharma, Shruthi Sridhar Vembar, Sandeep Sundriyal
Format: Article
Language:English
Published: BMC 2025-01-01
Series:BMC Chemistry
Subjects:
Online Access:https://doi.org/10.1186/s13065-025-01395-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572080660414464
author Mukul Kore
Dimple Acharya
Lakshya Sharma
Shruthi Sridhar Vembar
Sandeep Sundriyal
author_facet Mukul Kore
Dimple Acharya
Lakshya Sharma
Shruthi Sridhar Vembar
Sandeep Sundriyal
author_sort Mukul Kore
collection DOAJ
description Abstract A large set of antimalarial molecules (N ~ 15k) was employed from ChEMBL to build a robust random forest (RF) model for the prediction of antiplasmodial activity. Rather than depending on high throughput screening (HTS) data, molecules tested at multiple doses against blood stages of Plasmodium falciparum were used for model development. The open-access and code-free KNIME platform was used to develop a workflow to train the model on 80% of data (N ~ 12k). The hyperparameter values were optimized to achieve the highest predictive accuracy with nine different molecular fingerprints (MFPs), among which Avalon MFPs (referred to as RF-1) provided the best results. RF-1 displayed 91.7% accuracy, 93.5% precision, 88.4% sensitivity and 97.3% area under the Receiver operating characteristic (AUROC) for the remaining 20% test set. The predictive performance of RF-1 was comparable to that of the malaria inhibitor prediction platform (MAIP), a recently reported consensus model based on a large proprietary dataset. However, hits obtained from RF-1 and MAIP from a commercial library did not overlap, suggesting that these two models are complementary. Finally, RF-1 was used to screen small molecules under clinical investigations for repurposing. Six molecules were purchased, out of which two human kinase inhibitors were identified to have single-digit micromolar antiplasmodial activity. One of the hits (compound 1) was a potent inhibitor of β-hematin, suggesting the involvement of parasite hemozoin (Hz) synthesis in the parasiticidal effect. The training and test sets are provided as supplementary information, allowing others to reproduce this work.
format Article
id doaj-art-876892c3ade24127862f3273bfd1707d
institution Kabale University
issn 2661-801X
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series BMC Chemistry
spelling doaj-art-876892c3ade24127862f3273bfd1707d2025-02-02T12:06:51ZengBMCBMC Chemistry2661-801X2025-01-0119111910.1186/s13065-025-01395-4Development and experimental validation of a machine learning model for the prediction of new antimalarialsMukul Kore0Dimple Acharya1Lakshya Sharma2Shruthi Sridhar Vembar3Sandeep Sundriyal4Department of Pharmacy, Birla Institute of Technology and Science PilaniInstitute of Bioinformatics and Applied Biotechnology, Electronics City Phase IDepartment of Pharmacy, Birla Institute of Technology and Science PilaniInstitute of Bioinformatics and Applied Biotechnology, Electronics City Phase IDepartment of Pharmacy, Birla Institute of Technology and Science PilaniAbstract A large set of antimalarial molecules (N ~ 15k) was employed from ChEMBL to build a robust random forest (RF) model for the prediction of antiplasmodial activity. Rather than depending on high throughput screening (HTS) data, molecules tested at multiple doses against blood stages of Plasmodium falciparum were used for model development. The open-access and code-free KNIME platform was used to develop a workflow to train the model on 80% of data (N ~ 12k). The hyperparameter values were optimized to achieve the highest predictive accuracy with nine different molecular fingerprints (MFPs), among which Avalon MFPs (referred to as RF-1) provided the best results. RF-1 displayed 91.7% accuracy, 93.5% precision, 88.4% sensitivity and 97.3% area under the Receiver operating characteristic (AUROC) for the remaining 20% test set. The predictive performance of RF-1 was comparable to that of the malaria inhibitor prediction platform (MAIP), a recently reported consensus model based on a large proprietary dataset. However, hits obtained from RF-1 and MAIP from a commercial library did not overlap, suggesting that these two models are complementary. Finally, RF-1 was used to screen small molecules under clinical investigations for repurposing. Six molecules were purchased, out of which two human kinase inhibitors were identified to have single-digit micromolar antiplasmodial activity. One of the hits (compound 1) was a potent inhibitor of β-hematin, suggesting the involvement of parasite hemozoin (Hz) synthesis in the parasiticidal effect. The training and test sets are provided as supplementary information, allowing others to reproduce this work.https://doi.org/10.1186/s13065-025-01395-4MalariaMachine learningRandom forestKNIMEModellingChEMBL
spellingShingle Mukul Kore
Dimple Acharya
Lakshya Sharma
Shruthi Sridhar Vembar
Sandeep Sundriyal
Development and experimental validation of a machine learning model for the prediction of new antimalarials
BMC Chemistry
Malaria
Machine learning
Random forest
KNIME
Modelling
ChEMBL
title Development and experimental validation of a machine learning model for the prediction of new antimalarials
title_full Development and experimental validation of a machine learning model for the prediction of new antimalarials
title_fullStr Development and experimental validation of a machine learning model for the prediction of new antimalarials
title_full_unstemmed Development and experimental validation of a machine learning model for the prediction of new antimalarials
title_short Development and experimental validation of a machine learning model for the prediction of new antimalarials
title_sort development and experimental validation of a machine learning model for the prediction of new antimalarials
topic Malaria
Machine learning
Random forest
KNIME
Modelling
ChEMBL
url https://doi.org/10.1186/s13065-025-01395-4
work_keys_str_mv AT mukulkore developmentandexperimentalvalidationofamachinelearningmodelforthepredictionofnewantimalarials
AT dimpleacharya developmentandexperimentalvalidationofamachinelearningmodelforthepredictionofnewantimalarials
AT lakshyasharma developmentandexperimentalvalidationofamachinelearningmodelforthepredictionofnewantimalarials
AT shruthisridharvembar developmentandexperimentalvalidationofamachinelearningmodelforthepredictionofnewantimalarials
AT sandeepsundriyal developmentandexperimentalvalidationofamachinelearningmodelforthepredictionofnewantimalarials