One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening

Abstract Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models f...

Full description

Saved in:
Bibliographic Details
Main Authors: James Wellnitz, Sankalp Jain, Joshua E. Hochuli, Travis Maxfield, Eugene N. Muratov, Alexander Tropsha, Alexey V. Zakharov
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-00948-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594501160402944
author James Wellnitz
Sankalp Jain
Joshua E. Hochuli
Travis Maxfield
Eugene N. Muratov
Alexander Tropsha
Alexey V. Zakharov
author_facet James Wellnitz
Sankalp Jain
Joshua E. Hochuli
Travis Maxfield
Eugene N. Muratov
Alexander Tropsha
Alexey V. Zakharov
author_sort James Wellnitz
collection DOAJ
description Abstract Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we now recommend the use of models with the highest positive predictive value (PPV) built on imbalanced training sets as preferred virtual screening tools. This recommendation stems from practical considerations of how the results of virtual screening are used in experimental laboratories where only a small fraction of virtually screened molecules can be tested using standard well plates. As a proof of concept, we have developed QSAR models for five expansive datasets with different ratios of active and inactive molecules and compared model performance in virtual screening using BA, PPV, and other metrics. We show that training on imbalanced datasets achieves a hit rate at least 30% higher than using balanced datasets, and that the PPV metric captured this difference of performance with no parameter tuning. Importantly, hit rates were estimated for top scoring compounds organized in batches of the size of plates (for instance, 128 molecules) used in the experimental high throughput screening. Based on the results of our studies, we posit that QSAR models trained on imbalanced datasets with the highest PPV should be relied upon to identify and test hit compounds in early drug discovery studies.
format Article
id doaj-art-aa3da52fd7ce446eb33b88c524bc4f5e
institution Kabale University
issn 1758-2946
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-aa3da52fd7ce446eb33b88c524bc4f5e2025-01-19T12:37:02ZengBMCJournal of Cheminformatics1758-29462025-01-011711810.1186/s13321-025-00948-yOne size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screeningJames Wellnitz0Sankalp Jain1Joshua E. Hochuli2Travis Maxfield3Eugene N. Muratov4Alexander Tropsha5Alexey V. Zakharov6Division of Chemical Biology and Medicinal Chemistry, Laboratory for Molecular Modeling,, UNC Eshelman School of Pharmacy, University of North CarolinaNational Center for Advancing Translational Sciences (NCATS), National Institutes of HealthDivision of Chemical Biology and Medicinal Chemistry, Laboratory for Molecular Modeling,, UNC Eshelman School of Pharmacy, University of North CarolinaDivision of Chemical Biology and Medicinal Chemistry, Laboratory for Molecular Modeling,, UNC Eshelman School of Pharmacy, University of North CarolinaDivision of Chemical Biology and Medicinal Chemistry, Laboratory for Molecular Modeling,, UNC Eshelman School of Pharmacy, University of North CarolinaDivision of Chemical Biology and Medicinal Chemistry, Laboratory for Molecular Modeling,, UNC Eshelman School of Pharmacy, University of North CarolinaNational Center for Advancing Translational Sciences (NCATS), National Institutes of HealthAbstract Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we now recommend the use of models with the highest positive predictive value (PPV) built on imbalanced training sets as preferred virtual screening tools. This recommendation stems from practical considerations of how the results of virtual screening are used in experimental laboratories where only a small fraction of virtually screened molecules can be tested using standard well plates. As a proof of concept, we have developed QSAR models for five expansive datasets with different ratios of active and inactive molecules and compared model performance in virtual screening using BA, PPV, and other metrics. We show that training on imbalanced datasets achieves a hit rate at least 30% higher than using balanced datasets, and that the PPV metric captured this difference of performance with no parameter tuning. Importantly, hit rates were estimated for top scoring compounds organized in batches of the size of plates (for instance, 128 molecules) used in the experimental high throughput screening. Based on the results of our studies, we posit that QSAR models trained on imbalanced datasets with the highest PPV should be relied upon to identify and test hit compounds in early drug discovery studies.https://doi.org/10.1186/s13321-025-00948-yComputer-assisted drug discoveryQSAR modelingImbalanced datasetsVirtual screeningPositive predictive valueHit rate
spellingShingle James Wellnitz
Sankalp Jain
Joshua E. Hochuli
Travis Maxfield
Eugene N. Muratov
Alexander Tropsha
Alexey V. Zakharov
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening
Journal of Cheminformatics
Computer-assisted drug discovery
QSAR modeling
Imbalanced datasets
Virtual screening
Positive predictive value
Hit rate
title One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening
title_full One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening
title_fullStr One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening
title_full_unstemmed One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening
title_short One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening
title_sort one size does not fit all revising traditional paradigms for assessing accuracy of qsar models used for virtual screening
topic Computer-assisted drug discovery
QSAR modeling
Imbalanced datasets
Virtual screening
Positive predictive value
Hit rate
url https://doi.org/10.1186/s13321-025-00948-y
work_keys_str_mv AT jameswellnitz onesizedoesnotfitallrevisingtraditionalparadigmsforassessingaccuracyofqsarmodelsusedforvirtualscreening
AT sankalpjain onesizedoesnotfitallrevisingtraditionalparadigmsforassessingaccuracyofqsarmodelsusedforvirtualscreening
AT joshuaehochuli onesizedoesnotfitallrevisingtraditionalparadigmsforassessingaccuracyofqsarmodelsusedforvirtualscreening
AT travismaxfield onesizedoesnotfitallrevisingtraditionalparadigmsforassessingaccuracyofqsarmodelsusedforvirtualscreening
AT eugenenmuratov onesizedoesnotfitallrevisingtraditionalparadigmsforassessingaccuracyofqsarmodelsusedforvirtualscreening
AT alexandertropsha onesizedoesnotfitallrevisingtraditionalparadigmsforassessingaccuracyofqsarmodelsusedforvirtualscreening
AT alexeyvzakharov onesizedoesnotfitallrevisingtraditionalparadigmsforassessingaccuracyofqsarmodelsusedforvirtualscreening