TopoQual polishes circular consensus sequencing data and accurately predicts quality scores

Abstract Background Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and...

Full description

Saved in:
Bibliographic Details
Main Authors: Minindu Weerakoon, Sangjin Lee, Emily Mitchell, Haynes Heaton
Format: Article
Language:English
Published: BMC 2025-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-024-06020-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594449683709952
author Minindu Weerakoon
Sangjin Lee
Emily Mitchell
Haynes Heaton
author_facet Minindu Weerakoon
Sangjin Lee
Emily Mitchell
Haynes Heaton
author_sort Minindu Weerakoon
collection DOAJ
description Abstract Background Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads. Results To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data. Conclusion TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications.
format Article
id doaj-art-efd0d97d43724dcd8fa02b6f5127a846
institution Kabale University
issn 1471-2105
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-efd0d97d43724dcd8fa02b6f5127a8462025-01-19T12:40:58ZengBMCBMC Bioinformatics1471-21052025-01-0126111110.1186/s12859-024-06020-0TopoQual polishes circular consensus sequencing data and accurately predicts quality scoresMinindu Weerakoon0Sangjin Lee1Emily Mitchell2Haynes Heaton3Auburn UniversityWellcome Sanger Institute, Wellcome Genome CampusWellcome Sanger Institute, Wellcome Genome CampusAuburn UniversityAbstract Background Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads. Results To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data. Conclusion TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications.https://doi.org/10.1186/s12859-024-06020-0TopoqualDeep consensusPacbioCircular consensus sequencingHigh fidelitySomatic mutations
spellingShingle Minindu Weerakoon
Sangjin Lee
Emily Mitchell
Haynes Heaton
TopoQual polishes circular consensus sequencing data and accurately predicts quality scores
BMC Bioinformatics
Topoqual
Deep consensus
Pacbio
Circular consensus sequencing
High fidelity
Somatic mutations
title TopoQual polishes circular consensus sequencing data and accurately predicts quality scores
title_full TopoQual polishes circular consensus sequencing data and accurately predicts quality scores
title_fullStr TopoQual polishes circular consensus sequencing data and accurately predicts quality scores
title_full_unstemmed TopoQual polishes circular consensus sequencing data and accurately predicts quality scores
title_short TopoQual polishes circular consensus sequencing data and accurately predicts quality scores
title_sort topoqual polishes circular consensus sequencing data and accurately predicts quality scores
topic Topoqual
Deep consensus
Pacbio
Circular consensus sequencing
High fidelity
Somatic mutations
url https://doi.org/10.1186/s12859-024-06020-0
work_keys_str_mv AT mininduweerakoon topoqualpolishescircularconsensussequencingdataandaccuratelypredictsqualityscores
AT sangjinlee topoqualpolishescircularconsensussequencingdataandaccuratelypredictsqualityscores
AT emilymitchell topoqualpolishescircularconsensussequencingdataandaccuratelypredictsqualityscores
AT haynesheaton topoqualpolishescircularconsensussequencingdataandaccuratelypredictsqualityscores