TopoQual polishes circular consensus sequencing data and accurately predicts quality scores
Abstract Background Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-024-06020-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832594449683709952 |
---|---|
author | Minindu Weerakoon Sangjin Lee Emily Mitchell Haynes Heaton |
author_facet | Minindu Weerakoon Sangjin Lee Emily Mitchell Haynes Heaton |
author_sort | Minindu Weerakoon |
collection | DOAJ |
description | Abstract Background Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads. Results To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data. Conclusion TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications. |
format | Article |
id | doaj-art-efd0d97d43724dcd8fa02b6f5127a846 |
institution | Kabale University |
issn | 1471-2105 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj-art-efd0d97d43724dcd8fa02b6f5127a8462025-01-19T12:40:58ZengBMCBMC Bioinformatics1471-21052025-01-0126111110.1186/s12859-024-06020-0TopoQual polishes circular consensus sequencing data and accurately predicts quality scoresMinindu Weerakoon0Sangjin Lee1Emily Mitchell2Haynes Heaton3Auburn UniversityWellcome Sanger Institute, Wellcome Genome CampusWellcome Sanger Institute, Wellcome Genome CampusAuburn UniversityAbstract Background Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads. Results To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data. Conclusion TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications.https://doi.org/10.1186/s12859-024-06020-0TopoqualDeep consensusPacbioCircular consensus sequencingHigh fidelitySomatic mutations |
spellingShingle | Minindu Weerakoon Sangjin Lee Emily Mitchell Haynes Heaton TopoQual polishes circular consensus sequencing data and accurately predicts quality scores BMC Bioinformatics Topoqual Deep consensus Pacbio Circular consensus sequencing High fidelity Somatic mutations |
title | TopoQual polishes circular consensus sequencing data and accurately predicts quality scores |
title_full | TopoQual polishes circular consensus sequencing data and accurately predicts quality scores |
title_fullStr | TopoQual polishes circular consensus sequencing data and accurately predicts quality scores |
title_full_unstemmed | TopoQual polishes circular consensus sequencing data and accurately predicts quality scores |
title_short | TopoQual polishes circular consensus sequencing data and accurately predicts quality scores |
title_sort | topoqual polishes circular consensus sequencing data and accurately predicts quality scores |
topic | Topoqual Deep consensus Pacbio Circular consensus sequencing High fidelity Somatic mutations |
url | https://doi.org/10.1186/s12859-024-06020-0 |
work_keys_str_mv | AT mininduweerakoon topoqualpolishescircularconsensussequencingdataandaccuratelypredictsqualityscores AT sangjinlee topoqualpolishescircularconsensussequencingdataandaccuratelypredictsqualityscores AT emilymitchell topoqualpolishescircularconsensussequencingdataandaccuratelypredictsqualityscores AT haynesheaton topoqualpolishescircularconsensussequencingdataandaccuratelypredictsqualityscores |