An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among...

Full description

Saved in:
Bibliographic Details
Main Authors: Md. Rezaul Karim, Md. Mamunur Rashid, Byeong-Soo Jeong, Ho-Jin Choi
Format: Article
Language:English
Published: BioMed Central 2012-03-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gni-10-51.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832574080222494720
author Md. Rezaul Karim
Md. Mamunur Rashid
Byeong-Soo Jeong
Ho-Jin Choi
author_facet Md. Rezaul Karim
Md. Mamunur Rashid
Byeong-Soo Jeong
Ho-Jin Choi
author_sort Md. Rezaul Karim
collection DOAJ
description Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.
format Article
id doaj-art-0b3729aba50b4660a4e010f4d07c09a0
institution Kabale University
issn 1598-866X
2234-0742
language English
publishDate 2012-03-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-0b3729aba50b4660a4e010f4d07c09a02025-02-02T01:06:54ZengBioMed CentralGenomics & Informatics1598-866X2234-07422012-03-01101515710.5808/GI.2012.10.1.5131An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence DatabasesMd. Rezaul Karim0Md. Mamunur Rashid1Byeong-Soo Jeong2Ho-Jin Choi3Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea.Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.http://genominfo.org/upload/pdf/gni-10-51.pdfDNA sequencemaximal contiguous frequent patternpattern miningsuffix tree
spellingShingle Md. Rezaul Karim
Md. Mamunur Rashid
Byeong-Soo Jeong
Ho-Jin Choi
An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
Genomics & Informatics
DNA sequence
maximal contiguous frequent pattern
pattern mining
suffix tree
title An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_full An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_fullStr An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_full_unstemmed An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_short An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_sort efficient approach to mining maximal contiguous frequent patterns from large dna sequence databases
topic DNA sequence
maximal contiguous frequent pattern
pattern mining
suffix tree
url http://genominfo.org/upload/pdf/gni-10-51.pdf
work_keys_str_mv AT mdrezaulkarim anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases
AT mdmamunurrashid anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases
AT byeongsoojeong anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases
AT hojinchoi anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases
AT mdrezaulkarim efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases
AT mdmamunurrashid efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases
AT byeongsoojeong efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases
AT hojinchoi efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases