Efficient Mining of Interesting Patterns in Large Biological Sequences
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology,...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BioMed Central
2012-03-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gni-10-44.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832573914463600640 |
---|---|
author | Md. Mamunur Rashid Md. Rezaul Karim Byeong-Soo Jeong Ho-Jin Choi |
author_facet | Md. Mamunur Rashid Md. Rezaul Karim Byeong-Soo Jeong Ho-Jin Choi |
author_sort | Md. Mamunur Rashid |
collection | DOAJ |
description | Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time. |
format | Article |
id | doaj-art-9c7280b230d544ffb0f5fd9bd35bb3fa |
institution | Kabale University |
issn | 1598-866X 2234-0742 |
language | English |
publishDate | 2012-03-01 |
publisher | BioMed Central |
record_format | Article |
series | Genomics & Informatics |
spelling | doaj-art-9c7280b230d544ffb0f5fd9bd35bb3fa2025-02-02T02:15:29ZengBioMed CentralGenomics & Informatics1598-866X2234-07422012-03-01101445010.5808/GI.2012.10.1.4432Efficient Mining of Interesting Patterns in Large Biological SequencesMd. Mamunur Rashid0Md. Rezaul Karim1Byeong-Soo Jeong2Ho-Jin Choi3Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea.Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.http://genominfo.org/upload/pdf/gni-10-44.pdfDNA sequenceindex-based methodinformation gainpattern mining |
spellingShingle | Md. Mamunur Rashid Md. Rezaul Karim Byeong-Soo Jeong Ho-Jin Choi Efficient Mining of Interesting Patterns in Large Biological Sequences Genomics & Informatics DNA sequence index-based method information gain pattern mining |
title | Efficient Mining of Interesting Patterns in Large Biological Sequences |
title_full | Efficient Mining of Interesting Patterns in Large Biological Sequences |
title_fullStr | Efficient Mining of Interesting Patterns in Large Biological Sequences |
title_full_unstemmed | Efficient Mining of Interesting Patterns in Large Biological Sequences |
title_short | Efficient Mining of Interesting Patterns in Large Biological Sequences |
title_sort | efficient mining of interesting patterns in large biological sequences |
topic | DNA sequence index-based method information gain pattern mining |
url | http://genominfo.org/upload/pdf/gni-10-44.pdf |
work_keys_str_mv | AT mdmamunurrashid efficientminingofinterestingpatternsinlargebiologicalsequences AT mdrezaulkarim efficientminingofinterestingpatternsinlargebiologicalsequences AT byeongsoojeong efficientminingofinterestingpatternsinlargebiologicalsequences AT hojinchoi efficientminingofinterestingpatternsinlargebiologicalsequences |