Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]

Background The TaqMan Array Card (TAC) is an arrayed, high-throughput qPCR platform that can simultaneously detect multiple targets in a single reaction. However, the manual post-run analysis of TAC data is time consuming and subject to interpretation. We sought to automate the post-run analysis of...

Full description

Saved in:
Bibliographic Details
Main Authors: David Garrett Brown, Darwin J. Operario, Lan Wang, Shanrui Wu, Daniel T. Leung, Eric R. Houpt, James A. Platts-Mills, Jie Liu, Ben J. Brintz
Format: Article
Language:English
Published: F1000 Research Ltd 2025-01-01
Series:Gates Open Research
Subjects:
Online Access:https://gatesopenresearch.org/articles/9-1/v1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592859845361664
author David Garrett Brown
Darwin J. Operario
Lan Wang
Shanrui Wu
Daniel T. Leung
Eric R. Houpt
James A. Platts-Mills
Jie Liu
Ben J. Brintz
author_facet David Garrett Brown
Darwin J. Operario
Lan Wang
Shanrui Wu
Daniel T. Leung
Eric R. Houpt
James A. Platts-Mills
Jie Liu
Ben J. Brintz
author_sort David Garrett Brown
collection DOAJ
description Background The TaqMan Array Card (TAC) is an arrayed, high-throughput qPCR platform that can simultaneously detect multiple targets in a single reaction. However, the manual post-run analysis of TAC data is time consuming and subject to interpretation. We sought to automate the post-run analysis of TAC data using machine learning models. Methods We used 165,214 qPCR amplification curves from two studies to train and test two eXtreme Gradient Boosting (XGBoost) models. Previous manual analyses of the amplification curves by experts in qPCR analysis were used as the gold standard. First, a classification model predicted whether amplification occurred or not, and if so, a second model predicted the cycle threshold (Ct) value. We used 5-fold cross-validation to tune the models and assessed performance using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and mean absolute error (MAE). For external validation, we used 1,472 reactions previously analyzed by 17 laboratory scientists as part of an external quality assessment for a multisite study. Results In internal validation, the classification model achieved an accuracy of 0.996, sensitivity of 0.997, specificity of 0.993, PPV of 0.998, and NPV of 0.991. The Ct prediction model achieved a MAE of 0.590. In external validation, the automated analysis achieved an accuracy of 0.997 and a MAE of 0.611, and the automated analysis was more accurate than manual analyses by 14 of the 17 laboratory scientists. Conclusions We automated the post-run analysis of highly-arrayed qPCR data using machine learning models with high accuracy in comparison to a manual gold standard. This approach has the potential to save time and improve reproducibility in laboratories using the TAC platform and other high-throughput qPCR approaches.
format Article
id doaj-art-5a64703afb7148bd9c758e13c19db059
institution Kabale University
issn 2572-4754
language English
publishDate 2025-01-01
publisher F1000 Research Ltd
record_format Article
series Gates Open Research
spelling doaj-art-5a64703afb7148bd9c758e13c19db0592025-01-21T01:00:00ZengF1000 Research LtdGates Open Research2572-47542025-01-01917704Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]David Garrett Brown0Darwin J. Operario1Lan Wang2Shanrui Wu3Daniel T. Leung4Eric R. Houpt5James A. Platts-Mills6https://orcid.org/0000-0002-4956-0418Jie Liu7Ben J. Brintz8https://orcid.org/0000-0003-4695-0290University of Utah Department of Internal Medicine, Salt Lake City, Utah, USAUniversity of Virginia, Charlottesville, Virginia, USAQingdao University School of Public Healh, Qingdao, Shandong, ChinaQingdao University School of Public Healh, Qingdao, Shandong, ChinaUniversity of Utah Department of Internal Medicine, Salt Lake City, Utah, USAUniversity of Virginia, Charlottesville, Virginia, USAUniversity of Virginia, Charlottesville, Virginia, USAUniversity of Virginia, Charlottesville, Virginia, USAUniversity of Utah Department of Internal Medicine, Salt Lake City, Utah, USABackground The TaqMan Array Card (TAC) is an arrayed, high-throughput qPCR platform that can simultaneously detect multiple targets in a single reaction. However, the manual post-run analysis of TAC data is time consuming and subject to interpretation. We sought to automate the post-run analysis of TAC data using machine learning models. Methods We used 165,214 qPCR amplification curves from two studies to train and test two eXtreme Gradient Boosting (XGBoost) models. Previous manual analyses of the amplification curves by experts in qPCR analysis were used as the gold standard. First, a classification model predicted whether amplification occurred or not, and if so, a second model predicted the cycle threshold (Ct) value. We used 5-fold cross-validation to tune the models and assessed performance using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and mean absolute error (MAE). For external validation, we used 1,472 reactions previously analyzed by 17 laboratory scientists as part of an external quality assessment for a multisite study. Results In internal validation, the classification model achieved an accuracy of 0.996, sensitivity of 0.997, specificity of 0.993, PPV of 0.998, and NPV of 0.991. The Ct prediction model achieved a MAE of 0.590. In external validation, the automated analysis achieved an accuracy of 0.997 and a MAE of 0.611, and the automated analysis was more accurate than manual analyses by 14 of the 17 laboratory scientists. Conclusions We automated the post-run analysis of highly-arrayed qPCR data using machine learning models with high accuracy in comparison to a manual gold standard. This approach has the potential to save time and improve reproducibility in laboratories using the TAC platform and other high-throughput qPCR approaches.https://gatesopenresearch.org/articles/9-1/v1qPCR PCR amplification cycle threshold machine learningeng
spellingShingle David Garrett Brown
Darwin J. Operario
Lan Wang
Shanrui Wu
Daniel T. Leung
Eric R. Houpt
James A. Platts-Mills
Jie Liu
Ben J. Brintz
Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
Gates Open Research
qPCR
PCR amplification
cycle threshold
machine learning
eng
title Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_full Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_fullStr Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_full_unstemmed Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_short Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_sort automated post run analysis of arrayed quantitative pcr amplification curves using machine learning version 1 peer review awaiting peer review
topic qPCR
PCR amplification
cycle threshold
machine learning
eng
url https://gatesopenresearch.org/articles/9-1/v1
work_keys_str_mv AT davidgarrettbrown automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview
AT darwinjoperario automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview
AT lanwang automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview
AT shanruiwu automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview
AT danieltleung automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview
AT ericrhoupt automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview
AT jamesaplattsmills automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview
AT jieliu automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview
AT benjbrintz automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview