Enhancing uncertainty quantification in drug discovery with censored regression labels

In the early stages of drug discovery, decisions regarding which experiments to pursue can be influenced by computational models for quantitative structure–activity relationships (QSAR). These decisions are critical due to the time-consuming and expensive nature of the experiments. Therefore, it is...

Full description

Saved in:
Bibliographic Details
Main Authors: Emma Svensson, Hannah Rosa Friesacher, Susanne Winiwarter, Lewis Mervin, Adam Arany, Ola Engkvist
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Artificial Intelligence in the Life Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667318525000042
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850128691240108032
author Emma Svensson
Hannah Rosa Friesacher
Susanne Winiwarter
Lewis Mervin
Adam Arany
Ola Engkvist
author_facet Emma Svensson
Hannah Rosa Friesacher
Susanne Winiwarter
Lewis Mervin
Adam Arany
Ola Engkvist
author_sort Emma Svensson
collection DOAJ
description In the early stages of drug discovery, decisions regarding which experiments to pursue can be influenced by computational models for quantitative structure–activity relationships (QSAR). These decisions are critical due to the time-consuming and expensive nature of the experiments. Therefore, it is becoming essential to accurately quantify the uncertainty in machine learning predictions, such that resources can be used optimally and trust in the models improves. While computational methods for QSAR modeling often suffer from limited data and sparse experimental observations, additional information can exist in the form of censored labels that provide thresholds rather than precise values of observations. However, the standard approaches that quantify uncertainty in machine learning cannot fully utilize censored labels. In this work, we adapt ensemble-based, Bayesian, and Gaussian models with tools to learn from censored labels by using the Tobit model from survival analysis. Our results demonstrate that despite the partial information available in censored labels, they are essential to reliably estimate uncertainties in real pharmaceutical settings where approximately one-third or more of experimental labels are censored.
format Article
id doaj-art-2e7739b9e77f42d18d4391a28b7dddfd
institution OA Journals
issn 2667-3185
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Artificial Intelligence in the Life Sciences
spelling doaj-art-2e7739b9e77f42d18d4391a28b7dddfd2025-08-20T02:33:12ZengElsevierArtificial Intelligence in the Life Sciences2667-31852025-06-01710012810.1016/j.ailsci.2025.100128Enhancing uncertainty quantification in drug discovery with censored regression labelsEmma Svensson0Hannah Rosa Friesacher1Susanne Winiwarter2Lewis Mervin3Adam Arany4Ola Engkvist5Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden; ELLIS Unit Linz & Institute for Machine Learning, Johannes Kepler University Linz, Linz, 4040, Austria; Corresponding author at: Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden.Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden; ESAT-STADIUS, KU Leuven, Leuven, 3000, BelgiumDrug Metabolism and Pharmacokinetics, Research and Early Development Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, 431 83, SwedenMolecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, CB2 0AA, UKESAT-STADIUS, KU Leuven, Leuven, 3000, BelgiumMolecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden; Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 412 96, SwedenIn the early stages of drug discovery, decisions regarding which experiments to pursue can be influenced by computational models for quantitative structure–activity relationships (QSAR). These decisions are critical due to the time-consuming and expensive nature of the experiments. Therefore, it is becoming essential to accurately quantify the uncertainty in machine learning predictions, such that resources can be used optimally and trust in the models improves. While computational methods for QSAR modeling often suffer from limited data and sparse experimental observations, additional information can exist in the form of censored labels that provide thresholds rather than precise values of observations. However, the standard approaches that quantify uncertainty in machine learning cannot fully utilize censored labels. In this work, we adapt ensemble-based, Bayesian, and Gaussian models with tools to learn from censored labels by using the Tobit model from survival analysis. Our results demonstrate that despite the partial information available in censored labels, they are essential to reliably estimate uncertainties in real pharmaceutical settings where approximately one-third or more of experimental labels are censored.http://www.sciencedirect.com/science/article/pii/S2667318525000042Uncertainty quantificationCensored regressionTemporal evaluationDistribution shiftDeep learningDrug discovery
spellingShingle Emma Svensson
Hannah Rosa Friesacher
Susanne Winiwarter
Lewis Mervin
Adam Arany
Ola Engkvist
Enhancing uncertainty quantification in drug discovery with censored regression labels
Artificial Intelligence in the Life Sciences
Uncertainty quantification
Censored regression
Temporal evaluation
Distribution shift
Deep learning
Drug discovery
title Enhancing uncertainty quantification in drug discovery with censored regression labels
title_full Enhancing uncertainty quantification in drug discovery with censored regression labels
title_fullStr Enhancing uncertainty quantification in drug discovery with censored regression labels
title_full_unstemmed Enhancing uncertainty quantification in drug discovery with censored regression labels
title_short Enhancing uncertainty quantification in drug discovery with censored regression labels
title_sort enhancing uncertainty quantification in drug discovery with censored regression labels
topic Uncertainty quantification
Censored regression
Temporal evaluation
Distribution shift
Deep learning
Drug discovery
url http://www.sciencedirect.com/science/article/pii/S2667318525000042
work_keys_str_mv AT emmasvensson enhancinguncertaintyquantificationindrugdiscoverywithcensoredregressionlabels
AT hannahrosafriesacher enhancinguncertaintyquantificationindrugdiscoverywithcensoredregressionlabels
AT susannewiniwarter enhancinguncertaintyquantificationindrugdiscoverywithcensoredregressionlabels
AT lewismervin enhancinguncertaintyquantificationindrugdiscoverywithcensoredregressionlabels
AT adamarany enhancinguncertaintyquantificationindrugdiscoverywithcensoredregressionlabels
AT olaengkvist enhancinguncertaintyquantificationindrugdiscoverywithcensoredregressionlabels