Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.

Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos t...

Full description

Saved in:
Bibliographic Details
Main Authors: Liliana Florea, Alexander Souvorov, Theodore S Kalbfleisch, Steven L Salzberg
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2011-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0021400&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849727496660975616
author Liliana Florea
Alexander Souvorov
Theodore S Kalbfleisch
Steven L Salzberg
author_facet Liliana Florea
Alexander Souvorov
Theodore S Kalbfleisch
Steven L Salzberg
author_sort Liliana Florea
collection DOAJ
description Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome's annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12-20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6-15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly.
format Article
id doaj-art-eab9a93bfb474ecc9cd8d24af3591aa1
institution DOAJ
issn 1932-6203
language English
publishDate 2011-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-eab9a93bfb474ecc9cd8d24af3591aa12025-08-20T03:09:49ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-0166e2140010.1371/journal.pone.0021400Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.Liliana FloreaAlexander SouvorovTheodore S KalbfleischSteven L SalzbergGene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome's annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12-20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6-15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0021400&type=printable
spellingShingle Liliana Florea
Alexander Souvorov
Theodore S Kalbfleisch
Steven L Salzberg
Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.
PLoS ONE
title Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.
title_full Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.
title_fullStr Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.
title_full_unstemmed Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.
title_short Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.
title_sort genome assembly has a major impact on gene content a comparison of annotation in two bos taurus assemblies
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0021400&type=printable
work_keys_str_mv AT lilianaflorea genomeassemblyhasamajorimpactongenecontentacomparisonofannotationintwobostaurusassemblies
AT alexandersouvorov genomeassemblyhasamajorimpactongenecontentacomparisonofannotationintwobostaurusassemblies
AT theodoreskalbfleisch genomeassemblyhasamajorimpactongenecontentacomparisonofannotationintwobostaurusassemblies
AT stevenlsalzberg genomeassemblyhasamajorimpactongenecontentacomparisonofannotationintwobostaurusassemblies