Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions

Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own adva...

Full description

Saved in:
Bibliographic Details
Main Authors: Long Yang, Hwan-Gue Cho
Format: Article
Language:English
Published: BioMed Central 2012-03-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gni-10-58.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832573557187543040
author Long Yang
Hwan-Gue Cho
author_facet Long Yang
Hwan-Gue Cho
author_sort Long Yang
collection DOAJ
description Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.
format Article
id doaj-art-712583e29cf34049aca687ad0c4a265a
institution Kabale University
issn 1598-866X
2234-0742
language English
publishDate 2012-03-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-712583e29cf34049aca687ad0c4a265a2025-02-02T03:47:32ZengBioMed CentralGenomics & Informatics1598-866X2234-07422012-03-01101586410.5808/GI.2012.10.1.5834Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length DistributionsLong Yang0Hwan-Gue Cho1Tobacco Laboratory, Shandong Agricultural University, Shandong 271-018, China.Graphics Application Laboratory, Department of Computer Science and Engineering, Pusan National University, Busan 609-735, Korea.Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.http://genominfo.org/upload/pdf/gni-10-58.pdfintron length distributionsintron predictionplantthree phases
spellingShingle Long Yang
Hwan-Gue Cho
Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
Genomics & Informatics
intron length distributions
intron prediction
plant
three phases
title Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_full Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_fullStr Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_full_unstemmed Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_short Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_sort comparative evaluation of intron prediction methods and detection of plant genome annotation using intron length distributions
topic intron length distributions
intron prediction
plant
three phases
url http://genominfo.org/upload/pdf/gni-10-58.pdf
work_keys_str_mv AT longyang comparativeevaluationofintronpredictionmethodsanddetectionofplantgenomeannotationusingintronlengthdistributions
AT hwanguecho comparativeevaluationofintronpredictionmethodsanddetectionofplantgenomeannotationusingintronlengthdistributions