Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts
Abstract Accurately basecalling sequence backbones in the presence of nucleotide modifications remains a substantial challenge in nanopore sequencing bioinformatics. It has been extensively demonstrated that state-of-the-art basecallers are less compatible with modification-induced sequencing signal...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-025-55974-z |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832594541193986048 |
---|---|
author | Ziyuan Wang Ziyang Liu Yinshan Fang Hao Helen Zhang Xiaoxiao Sun Ning Hao Jianwen Que Hongxu Ding |
author_facet | Ziyuan Wang Ziyang Liu Yinshan Fang Hao Helen Zhang Xiaoxiao Sun Ning Hao Jianwen Que Hongxu Ding |
author_sort | Ziyuan Wang |
collection | DOAJ |
description | Abstract Accurately basecalling sequence backbones in the presence of nucleotide modifications remains a substantial challenge in nanopore sequencing bioinformatics. It has been extensively demonstrated that state-of-the-art basecallers are less compatible with modification-induced sequencing signals. A precise basecalling, on the other hand, serves as the prerequisite for virtually all the downstream analyses. Here, we report that basecallers exposed to diverse training modifications gain the generalizability to analyze novel modifications. With synthesized oligos as the model system, we precisely basecall various out-of-sample RNA modifications. From the representation learning perspective, we attribute this generalizability to basecaller representation space expanded by diverse training modifications. Taken together, we conclude increasing the training data diversity as a paradigm for building modification-tolerant nanopore sequencing basecallers. |
format | Article |
id | doaj-art-36d8cf368ae24c69958ecbc8695a7803 |
institution | Kabale University |
issn | 2041-1723 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj-art-36d8cf368ae24c69958ecbc8695a78032025-01-19T12:32:02ZengNature PortfolioNature Communications2041-17232025-01-011611910.1038/s41467-025-55974-zTraining data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readoutsZiyuan Wang0Ziyang Liu1Yinshan Fang2Hao Helen Zhang3Xiaoxiao Sun4Ning Hao5Jianwen Que6Hongxu Ding7Department of Pharmacy Practice and Science, University of ArizonaDepartment of Pharmacy Practice and Science, University of ArizonaColumbia Center for Human Development, Department of Medicine, Columbia University Medical CenterStatistics and Data Science GIDP, University of ArizonaStatistics and Data Science GIDP, University of ArizonaStatistics and Data Science GIDP, University of ArizonaColumbia Center for Human Development, Department of Medicine, Columbia University Medical CenterDepartment of Pharmacy Practice and Science, University of ArizonaAbstract Accurately basecalling sequence backbones in the presence of nucleotide modifications remains a substantial challenge in nanopore sequencing bioinformatics. It has been extensively demonstrated that state-of-the-art basecallers are less compatible with modification-induced sequencing signals. A precise basecalling, on the other hand, serves as the prerequisite for virtually all the downstream analyses. Here, we report that basecallers exposed to diverse training modifications gain the generalizability to analyze novel modifications. With synthesized oligos as the model system, we precisely basecall various out-of-sample RNA modifications. From the representation learning perspective, we attribute this generalizability to basecaller representation space expanded by diverse training modifications. Taken together, we conclude increasing the training data diversity as a paradigm for building modification-tolerant nanopore sequencing basecallers.https://doi.org/10.1038/s41467-025-55974-z |
spellingShingle | Ziyuan Wang Ziyang Liu Yinshan Fang Hao Helen Zhang Xiaoxiao Sun Ning Hao Jianwen Que Hongxu Ding Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts Nature Communications |
title | Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts |
title_full | Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts |
title_fullStr | Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts |
title_full_unstemmed | Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts |
title_short | Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts |
title_sort | training data diversity enhances the basecalling of novel rna modification induced nanopore sequencing readouts |
url | https://doi.org/10.1038/s41467-025-55974-z |
work_keys_str_mv | AT ziyuanwang trainingdatadiversityenhancesthebasecallingofnovelrnamodificationinducednanoporesequencingreadouts AT ziyangliu trainingdatadiversityenhancesthebasecallingofnovelrnamodificationinducednanoporesequencingreadouts AT yinshanfang trainingdatadiversityenhancesthebasecallingofnovelrnamodificationinducednanoporesequencingreadouts AT haohelenzhang trainingdatadiversityenhancesthebasecallingofnovelrnamodificationinducednanoporesequencingreadouts AT xiaoxiaosun trainingdatadiversityenhancesthebasecallingofnovelrnamodificationinducednanoporesequencingreadouts AT ninghao trainingdatadiversityenhancesthebasecallingofnovelrnamodificationinducednanoporesequencingreadouts AT jianwenque trainingdatadiversityenhancesthebasecallingofnovelrnamodificationinducednanoporesequencingreadouts AT hongxuding trainingdatadiversityenhancesthebasecallingofnovelrnamodificationinducednanoporesequencingreadouts |