Computational problems of analysis of short next generation sequencing reads

Short read next generation sequencing (NGS) has significant impacts on modern genomics, genetics, cell biology and medicine, especially on meta-genomics, comparative genomics, polymorphism detection, mutation screening, transcriptome profiling, methylation profiling, chromatin remodelling and many m...

Full description

Saved in:
Bibliographic Details
Main Authors: R. te Boekhorst, F. M. Naumenko, N. G. Orlova, E. R. Galieva, A. M. Spitsina, I. V. Chadaeva, Y. L. Orlov, I. I. Abnizova
Format: Article
Language:English
Published: Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders 2017-02-01
Series:Вавиловский журнал генетики и селекции
Subjects:
Online Access:https://vavilov.elpub.ru/jour/article/view/845
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575235422945280
author R. te Boekhorst
F. M. Naumenko
N. G. Orlova
E. R. Galieva
A. M. Spitsina
I. V. Chadaeva
Y. L. Orlov
I. I. Abnizova
author_facet R. te Boekhorst
F. M. Naumenko
N. G. Orlova
E. R. Galieva
A. M. Spitsina
I. V. Chadaeva
Y. L. Orlov
I. I. Abnizova
author_sort R. te Boekhorst
collection DOAJ
description Short read next generation sequencing (NGS) has significant impacts on modern genomics, genetics, cell biology and medicine, especially on meta-genomics, comparative genomics, polymorphism detection, mutation screening, transcriptome profiling, methylation profiling, chromatin remodelling and many more applications. However, NGS are prone for errors which complicate scientific conclusions. NGS technologies consist of shearing DNA molecules into collection of numerous small fragments, called a ‘library’, and their further extensive parallel sequencing. These sequenced overlapping fragments are called ‘reads’, they are assembled into contiguous strings. The contiguous sequences are in turn assembled into genomes for further analysis. Computational sequencing problems are those arising from numerical processing of sequenced samples. The numerical processing involves procedures such as: quality-scoring, mapping/assembling, and surprisingly, error-correction of a data. This paper is reviewing post-processing errors and computational methods to discern them. It also includes sequencing dictionary. We present here quality control of raw data, errors arising at the steps of alignment of sequencing reads to a reference genome and assembly. Finally this work presents identification of mutations (“Variant calling”) in sequencing data and its quality control.
format Article
id doaj-art-6f75b5e5c2cf4ac08e0ba331824284c8
institution Kabale University
issn 2500-3259
language English
publishDate 2017-02-01
publisher Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders
record_format Article
series Вавиловский журнал генетики и селекции
spelling doaj-art-6f75b5e5c2cf4ac08e0ba331824284c82025-02-01T09:58:03ZengSiberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and BreedersВавиловский журнал генетики и селекции2500-32592017-02-0120674675510.18699/VJ16.191537Computational problems of analysis of short next generation sequencing readsR. te Boekhorst0F. M. Naumenko1N. G. Orlova2E. R. Galieva3A. M. Spitsina4I. V. Chadaeva5Y. L. Orlov6I. I. Abnizova7University of HertfordshireNovosibirsk State UniversityNovosibirsk State University Novosibirsk State University of Architecture and Civil Engineering (Sibstrin)Novosibirsk State University Institute of Cytology and Genetics SB RASNovosibirsk State UniversityNovosibirsk State UniversityNovosibirsk State University Institute of Cytology and Genetics SB RASWellcome Trust Sanger InstituteShort read next generation sequencing (NGS) has significant impacts on modern genomics, genetics, cell biology and medicine, especially on meta-genomics, comparative genomics, polymorphism detection, mutation screening, transcriptome profiling, methylation profiling, chromatin remodelling and many more applications. However, NGS are prone for errors which complicate scientific conclusions. NGS technologies consist of shearing DNA molecules into collection of numerous small fragments, called a ‘library’, and their further extensive parallel sequencing. These sequenced overlapping fragments are called ‘reads’, they are assembled into contiguous strings. The contiguous sequences are in turn assembled into genomes for further analysis. Computational sequencing problems are those arising from numerical processing of sequenced samples. The numerical processing involves procedures such as: quality-scoring, mapping/assembling, and surprisingly, error-correction of a data. This paper is reviewing post-processing errors and computational methods to discern them. It also includes sequencing dictionary. We present here quality control of raw data, errors arising at the steps of alignment of sequencing reads to a reference genome and assembly. Finally this work presents identification of mutations (“Variant calling”) in sequencing data and its quality control.https://vavilov.elpub.ru/jour/article/view/845next generation sequencing (ngs)dnasequencing technologiesstatistical biasesgenome polymorphismssequencing errorsreview
spellingShingle R. te Boekhorst
F. M. Naumenko
N. G. Orlova
E. R. Galieva
A. M. Spitsina
I. V. Chadaeva
Y. L. Orlov
I. I. Abnizova
Computational problems of analysis of short next generation sequencing reads
Вавиловский журнал генетики и селекции
next generation sequencing (ngs)
dna
sequencing technologies
statistical biases
genome polymorphisms
sequencing errors
review
title Computational problems of analysis of short next generation sequencing reads
title_full Computational problems of analysis of short next generation sequencing reads
title_fullStr Computational problems of analysis of short next generation sequencing reads
title_full_unstemmed Computational problems of analysis of short next generation sequencing reads
title_short Computational problems of analysis of short next generation sequencing reads
title_sort computational problems of analysis of short next generation sequencing reads
topic next generation sequencing (ngs)
dna
sequencing technologies
statistical biases
genome polymorphisms
sequencing errors
review
url https://vavilov.elpub.ru/jour/article/view/845
work_keys_str_mv AT rteboekhorst computationalproblemsofanalysisofshortnextgenerationsequencingreads
AT fmnaumenko computationalproblemsofanalysisofshortnextgenerationsequencingreads
AT ngorlova computationalproblemsofanalysisofshortnextgenerationsequencingreads
AT ergalieva computationalproblemsofanalysisofshortnextgenerationsequencingreads
AT amspitsina computationalproblemsofanalysisofshortnextgenerationsequencingreads
AT ivchadaeva computationalproblemsofanalysisofshortnextgenerationsequencingreads
AT ylorlov computationalproblemsofanalysisofshortnextgenerationsequencingreads
AT iiabnizova computationalproblemsofanalysisofshortnextgenerationsequencingreads