Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own adva...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BioMed Central
2012-03-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gni-10-58.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832573557187543040 |
---|---|
author | Long Yang Hwan-Gue Cho |
author_facet | Long Yang Hwan-Gue Cho |
author_sort | Long Yang |
collection | DOAJ |
description | Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems. |
format | Article |
id | doaj-art-712583e29cf34049aca687ad0c4a265a |
institution | Kabale University |
issn | 1598-866X 2234-0742 |
language | English |
publishDate | 2012-03-01 |
publisher | BioMed Central |
record_format | Article |
series | Genomics & Informatics |
spelling | doaj-art-712583e29cf34049aca687ad0c4a265a2025-02-02T03:47:32ZengBioMed CentralGenomics & Informatics1598-866X2234-07422012-03-01101586410.5808/GI.2012.10.1.5834Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length DistributionsLong Yang0Hwan-Gue Cho1Tobacco Laboratory, Shandong Agricultural University, Shandong 271-018, China.Graphics Application Laboratory, Department of Computer Science and Engineering, Pusan National University, Busan 609-735, Korea.Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.http://genominfo.org/upload/pdf/gni-10-58.pdfintron length distributionsintron predictionplantthree phases |
spellingShingle | Long Yang Hwan-Gue Cho Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions Genomics & Informatics intron length distributions intron prediction plant three phases |
title | Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions |
title_full | Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions |
title_fullStr | Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions |
title_full_unstemmed | Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions |
title_short | Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions |
title_sort | comparative evaluation of intron prediction methods and detection of plant genome annotation using intron length distributions |
topic | intron length distributions intron prediction plant three phases |
url | http://genominfo.org/upload/pdf/gni-10-58.pdf |
work_keys_str_mv | AT longyang comparativeevaluationofintronpredictionmethodsanddetectionofplantgenomeannotationusingintronlengthdistributions AT hwanguecho comparativeevaluationofintronpredictionmethodsanddetectionofplantgenomeannotationusingintronlengthdistributions |