Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms

Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequen...

Full description

Saved in:
Bibliographic Details
Main Authors: Bulgan Galbadrakh, Kyung-Eun Lee, Hyun-Seok Park
Format: Article
Language:English
Published: BioMed Central 2012-12-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gni-10-266.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832569068129878016
author Bulgan Galbadrakh
Kyung-Eun Lee
Hyun-Seok Park
author_facet Bulgan Galbadrakh
Kyung-Eun Lee
Hyun-Seok Park
author_sort Bulgan Galbadrakh
collection DOAJ
description Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.
format Article
id doaj-art-aeddfdd398e04160b274dca627edc525
institution Kabale University
issn 1598-866X
2234-0742
language English
publishDate 2012-12-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-aeddfdd398e04160b274dca627edc5252025-02-02T23:30:22ZengBioMed CentralGenomics & Informatics1598-866X2234-07422012-12-0110426627010.5808/GI.2012.10.4.26630Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression AlgorithmsBulgan Galbadrakh0Kyung-Eun Lee1Hyun-Seok Park2Department of Computer Science, Ewha Womans University, Seoul 120-750, Korea.Department of Computer Science, Ewha Womans University, Seoul 120-750, Korea.Department of Computer Science, Ewha Womans University, Seoul 120-750, Korea.Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.http://genominfo.org/upload/pdf/gni-10-266.pdfcontext-free grammarformal language theorynatural language processingstochastic modeling
spellingShingle Bulgan Galbadrakh
Kyung-Eun Lee
Hyun-Seok Park
Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms
Genomics & Informatics
context-free grammar
formal language theory
natural language processing
stochastic modeling
title Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms
title_full Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms
title_fullStr Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms
title_full_unstemmed Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms
title_short Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms
title_sort developing jsequitur to study the hierarchical structure of biological sequences in a grammatical inference framework of string compression algorithms
topic context-free grammar
formal language theory
natural language processing
stochastic modeling
url http://genominfo.org/upload/pdf/gni-10-266.pdf
work_keys_str_mv AT bulgangalbadrakh developingjsequiturtostudythehierarchicalstructureofbiologicalsequencesinagrammaticalinferenceframeworkofstringcompressionalgorithms
AT kyungeunlee developingjsequiturtostudythehierarchicalstructureofbiologicalsequencesinagrammaticalinferenceframeworkofstringcompressionalgorithms
AT hyunseokpark developingjsequiturtostudythehierarchicalstructureofbiologicalsequencesinagrammaticalinferenceframeworkofstringcompressionalgorithms