An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BioMed Central
2012-03-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gni-10-51.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832574080222494720 |
---|---|
author | Md. Rezaul Karim Md. Mamunur Rashid Byeong-Soo Jeong Ho-Jin Choi |
author_facet | Md. Rezaul Karim Md. Mamunur Rashid Byeong-Soo Jeong Ho-Jin Choi |
author_sort | Md. Rezaul Karim |
collection | DOAJ |
description | Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time. |
format | Article |
id | doaj-art-0b3729aba50b4660a4e010f4d07c09a0 |
institution | Kabale University |
issn | 1598-866X 2234-0742 |
language | English |
publishDate | 2012-03-01 |
publisher | BioMed Central |
record_format | Article |
series | Genomics & Informatics |
spelling | doaj-art-0b3729aba50b4660a4e010f4d07c09a02025-02-02T01:06:54ZengBioMed CentralGenomics & Informatics1598-866X2234-07422012-03-01101515710.5808/GI.2012.10.1.5131An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence DatabasesMd. Rezaul Karim0Md. Mamunur Rashid1Byeong-Soo Jeong2Ho-Jin Choi3Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea.Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.http://genominfo.org/upload/pdf/gni-10-51.pdfDNA sequencemaximal contiguous frequent patternpattern miningsuffix tree |
spellingShingle | Md. Rezaul Karim Md. Mamunur Rashid Byeong-Soo Jeong Ho-Jin Choi An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases Genomics & Informatics DNA sequence maximal contiguous frequent pattern pattern mining suffix tree |
title | An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases |
title_full | An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases |
title_fullStr | An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases |
title_full_unstemmed | An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases |
title_short | An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases |
title_sort | efficient approach to mining maximal contiguous frequent patterns from large dna sequence databases |
topic | DNA sequence maximal contiguous frequent pattern pattern mining suffix tree |
url | http://genominfo.org/upload/pdf/gni-10-51.pdf |
work_keys_str_mv | AT mdrezaulkarim anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT mdmamunurrashid anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT byeongsoojeong anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT hojinchoi anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT mdrezaulkarim efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT mdmamunurrashid efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT byeongsoojeong efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT hojinchoi efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases |