Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]

Background The TaqMan Array Card (TAC) is an arrayed, high-throughput qPCR platform that can simultaneously detect multiple targets in a single reaction. However, the manual post-run analysis of TAC data is time consuming and subject to interpretation. We sought to automate the post-run analysis of...

Full description

Saved in:

Bibliographic Details
Main Authors:	David Garrett Brown, Darwin J. Operario, Lan Wang, Shanrui Wu, Daniel T. Leung, Eric R. Houpt, James A. Platts-Mills, Jie Liu, Ben J. Brintz
Format:	Article
Language:	English
Published:	F1000 Research Ltd 2025-01-01
Series:	Gates Open Research
Subjects:	qPCR PCR amplification cycle threshold machine learning eng
Online Access:	https://gatesopenresearch.org/articles/9-1/v1
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832592859845361664
author	David Garrett Brown Darwin J. Operario Lan Wang Shanrui Wu Daniel T. Leung Eric R. Houpt James A. Platts-Mills Jie Liu Ben J. Brintz
author_facet	David Garrett Brown Darwin J. Operario Lan Wang Shanrui Wu Daniel T. Leung Eric R. Houpt James A. Platts-Mills Jie Liu Ben J. Brintz
author_sort	David Garrett Brown
collection	DOAJ
description	Background The TaqMan Array Card (TAC) is an arrayed, high-throughput qPCR platform that can simultaneously detect multiple targets in a single reaction. However, the manual post-run analysis of TAC data is time consuming and subject to interpretation. We sought to automate the post-run analysis of TAC data using machine learning models. Methods We used 165,214 qPCR amplification curves from two studies to train and test two eXtreme Gradient Boosting (XGBoost) models. Previous manual analyses of the amplification curves by experts in qPCR analysis were used as the gold standard. First, a classification model predicted whether amplification occurred or not, and if so, a second model predicted the cycle threshold (Ct) value. We used 5-fold cross-validation to tune the models and assessed performance using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and mean absolute error (MAE). For external validation, we used 1,472 reactions previously analyzed by 17 laboratory scientists as part of an external quality assessment for a multisite study. Results In internal validation, the classification model achieved an accuracy of 0.996, sensitivity of 0.997, specificity of 0.993, PPV of 0.998, and NPV of 0.991. The Ct prediction model achieved a MAE of 0.590. In external validation, the automated analysis achieved an accuracy of 0.997 and a MAE of 0.611, and the automated analysis was more accurate than manual analyses by 14 of the 17 laboratory scientists. Conclusions We automated the post-run analysis of highly-arrayed qPCR data using machine learning models with high accuracy in comparison to a manual gold standard. This approach has the potential to save time and improve reproducibility in laboratories using the TAC platform and other high-throughput qPCR approaches.
format	Article
id	doaj-art-5a64703afb7148bd9c758e13c19db059
institution	Kabale University
issn	2572-4754
language	English
publishDate	2025-01-01
publisher	F1000 Research Ltd
record_format	Article
series	Gates Open Research
spelling	doaj-art-5a64703afb7148bd9c758e13c19db0592025-01-21T01:00:00ZengF1000 Research LtdGates Open Research2572-47542025-01-01917704Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]David Garrett Brown0Darwin J. Operario1Lan Wang2Shanrui Wu3Daniel T. Leung4Eric R. Houpt5James A. Platts-Mills6https://orcid.org/0000-0002-4956-0418Jie Liu7Ben J. Brintz8https://orcid.org/0000-0003-4695-0290University of Utah Department of Internal Medicine, Salt Lake City, Utah, USAUniversity of Virginia, Charlottesville, Virginia, USAQingdao University School of Public Healh, Qingdao, Shandong, ChinaQingdao University School of Public Healh, Qingdao, Shandong, ChinaUniversity of Utah Department of Internal Medicine, Salt Lake City, Utah, USAUniversity of Virginia, Charlottesville, Virginia, USAUniversity of Virginia, Charlottesville, Virginia, USAUniversity of Virginia, Charlottesville, Virginia, USAUniversity of Utah Department of Internal Medicine, Salt Lake City, Utah, USABackground The TaqMan Array Card (TAC) is an arrayed, high-throughput qPCR platform that can simultaneously detect multiple targets in a single reaction. However, the manual post-run analysis of TAC data is time consuming and subject to interpretation. We sought to automate the post-run analysis of TAC data using machine learning models. Methods We used 165,214 qPCR amplification curves from two studies to train and test two eXtreme Gradient Boosting (XGBoost) models. Previous manual analyses of the amplification curves by experts in qPCR analysis were used as the gold standard. First, a classification model predicted whether amplification occurred or not, and if so, a second model predicted the cycle threshold (Ct) value. We used 5-fold cross-validation to tune the models and assessed performance using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and mean absolute error (MAE). For external validation, we used 1,472 reactions previously analyzed by 17 laboratory scientists as part of an external quality assessment for a multisite study. Results In internal validation, the classification model achieved an accuracy of 0.996, sensitivity of 0.997, specificity of 0.993, PPV of 0.998, and NPV of 0.991. The Ct prediction model achieved a MAE of 0.590. In external validation, the automated analysis achieved an accuracy of 0.997 and a MAE of 0.611, and the automated analysis was more accurate than manual analyses by 14 of the 17 laboratory scientists. Conclusions We automated the post-run analysis of highly-arrayed qPCR data using machine learning models with high accuracy in comparison to a manual gold standard. This approach has the potential to save time and improve reproducibility in laboratories using the TAC platform and other high-throughput qPCR approaches.https://gatesopenresearch.org/articles/9-1/v1qPCR PCR amplification cycle threshold machine learningeng
spellingShingle	David Garrett Brown Darwin J. Operario Lan Wang Shanrui Wu Daniel T. Leung Eric R. Houpt James A. Platts-Mills Jie Liu Ben J. Brintz Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review] Gates Open Research qPCR PCR amplification cycle threshold machine learning eng
title	Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_full	Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_fullStr	Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_full_unstemmed	Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_short	Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]
title_sort	automated post run analysis of arrayed quantitative pcr amplification curves using machine learning version 1 peer review awaiting peer review
topic	qPCR PCR amplification cycle threshold machine learning eng
url	https://gatesopenresearch.org/articles/9-1/v1
work_keys_str_mv	AT davidgarrettbrown automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview AT darwinjoperario automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview AT lanwang automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview AT shanruiwu automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview AT danieltleung automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview AT ericrhoupt automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview AT jamesaplattsmills automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview AT jieliu automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview AT benjbrintz automatedpostrunanalysisofarrayedquantitativepcramplificationcurvesusingmachinelearningversion1peerreviewawaitingpeerreview

Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning [version 1; peer review: awaiting peer review]

Similar Items