kMoL: an open-source machine and federated learning library for drug discovery

Abstract Machine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to...

Full description

Saved in:
Bibliographic Details
Main Authors: Romeo Cozac, Haris Hasic, Jun Jin Choong, Vincent Richard, Loic Beheshti, Cyrille Froehlich, Takuto Koyama, Shigeyuki Matsumoto, Ryosuke Kojima, Hiroaki Iwata, Aki Hasegawa, Takao Otsuka, Yasushi Okuno
Format: Article
Language:English
Published: BMC 2025-02-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-00967-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850045780747878400
author Romeo Cozac
Haris Hasic
Jun Jin Choong
Vincent Richard
Loic Beheshti
Cyrille Froehlich
Takuto Koyama
Shigeyuki Matsumoto
Ryosuke Kojima
Hiroaki Iwata
Aki Hasegawa
Takao Otsuka
Yasushi Okuno
author_facet Romeo Cozac
Haris Hasic
Jun Jin Choong
Vincent Richard
Loic Beheshti
Cyrille Froehlich
Takuto Koyama
Shigeyuki Matsumoto
Ryosuke Kojima
Hiroaki Iwata
Aki Hasegawa
Takao Otsuka
Yasushi Okuno
author_sort Romeo Cozac
collection DOAJ
description Abstract Machine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to their inherent ability to model molecular structures using graph-based representations. However, maximizing the potential of such models in practice is challenging, as companies prioritize data privacy and security over collaboration initiatives to improve model performance and robustness. kMoL is an open-source machine learning library with integrated federated learning capabilities developed to address such challenges. Its key features include state-of-the-art model architectures, Bayesian optimization, explainability, and federated learning mechanisms. It demonstrates extensive customization possibilities, advanced security features, straightforward implementation of user-specific models, and high adaptability to custom datasets without additional programming requirements. kMoL is evaluated through locally trained benchmark settings and distributed federated learning experiments using various datasets to assess the features and flexibility of the library, as well as the ability to facilitate fast and practical experimentation. Additionally, results of these experiments provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines. kMoL is available on GitHub at https://github.com/elix-tech/kmol . Scientific contribution The primary scientific contribution of this research project is the introduction and evaluation of kMoL, an open-source machine learning library with integrated federated learning capabilities. By demonstrating advanced customization and security capabilities without additional programming requirements, kMoL represents an accessible yet secure open-source platform for collaborative drug discovery projects. Additionally, the experiment results provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines.
format Article
id doaj-art-94e5d65782d34c2092dae0abf0605d39
institution DOAJ
issn 1758-2946
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-94e5d65782d34c2092dae0abf0605d392025-08-20T02:54:37ZengBMCJournal of Cheminformatics1758-29462025-02-0117111510.1186/s13321-025-00967-9kMoL: an open-source machine and federated learning library for drug discoveryRomeo Cozac0Haris Hasic1Jun Jin Choong2Vincent Richard3Loic Beheshti4Cyrille Froehlich5Takuto Koyama6Shigeyuki Matsumoto7Ryosuke Kojima8Hiroaki Iwata9Aki Hasegawa10Takao Otsuka11Yasushi Okuno12Elix, Inc.Elix, Inc.Elix, Inc.Elix, Inc.Elix, Inc.Elix, Inc.Graduate School of Medicine, Kyoto UniversityGraduate School of Medicine, Kyoto UniversityGraduate School of Medicine, Kyoto UniversityGraduate School of Medicine, Kyoto UniversityGraduate School of Medicine, Kyoto UniversityGraduate School of Medicine, Kyoto UniversityGraduate School of Medicine, Kyoto UniversityAbstract Machine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to their inherent ability to model molecular structures using graph-based representations. However, maximizing the potential of such models in practice is challenging, as companies prioritize data privacy and security over collaboration initiatives to improve model performance and robustness. kMoL is an open-source machine learning library with integrated federated learning capabilities developed to address such challenges. Its key features include state-of-the-art model architectures, Bayesian optimization, explainability, and federated learning mechanisms. It demonstrates extensive customization possibilities, advanced security features, straightforward implementation of user-specific models, and high adaptability to custom datasets without additional programming requirements. kMoL is evaluated through locally trained benchmark settings and distributed federated learning experiments using various datasets to assess the features and flexibility of the library, as well as the ability to facilitate fast and practical experimentation. Additionally, results of these experiments provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines. kMoL is available on GitHub at https://github.com/elix-tech/kmol . Scientific contribution The primary scientific contribution of this research project is the introduction and evaluation of kMoL, an open-source machine learning library with integrated federated learning capabilities. By demonstrating advanced customization and security capabilities without additional programming requirements, kMoL represents an accessible yet secure open-source platform for collaborative drug discovery projects. Additionally, the experiment results provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines.https://doi.org/10.1186/s13321-025-00967-9Machine learningFederated learningDrug discoveryDeep learningGraph convolutional networksDistributed learning
spellingShingle Romeo Cozac
Haris Hasic
Jun Jin Choong
Vincent Richard
Loic Beheshti
Cyrille Froehlich
Takuto Koyama
Shigeyuki Matsumoto
Ryosuke Kojima
Hiroaki Iwata
Aki Hasegawa
Takao Otsuka
Yasushi Okuno
kMoL: an open-source machine and federated learning library for drug discovery
Journal of Cheminformatics
Machine learning
Federated learning
Drug discovery
Deep learning
Graph convolutional networks
Distributed learning
title kMoL: an open-source machine and federated learning library for drug discovery
title_full kMoL: an open-source machine and federated learning library for drug discovery
title_fullStr kMoL: an open-source machine and federated learning library for drug discovery
title_full_unstemmed kMoL: an open-source machine and federated learning library for drug discovery
title_short kMoL: an open-source machine and federated learning library for drug discovery
title_sort kmol an open source machine and federated learning library for drug discovery
topic Machine learning
Federated learning
Drug discovery
Deep learning
Graph convolutional networks
Distributed learning
url https://doi.org/10.1186/s13321-025-00967-9
work_keys_str_mv AT romeocozac kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT harishasic kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT junjinchoong kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT vincentrichard kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT loicbeheshti kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT cyrillefroehlich kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT takutokoyama kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT shigeyukimatsumoto kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT ryosukekojima kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT hiroakiiwata kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT akihasegawa kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT takaootsuka kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery
AT yasushiokuno kmolanopensourcemachineandfederatedlearninglibraryfordrugdiscovery