Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins

The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distribu...

Full description

Saved in:
Bibliographic Details
Main Authors: Hao-Bo Guo, Yue Ma, Gerald A. Tuskan, Xiaohan Yang, Hong Guo
Format: Article
Language:English
Published: Wiley 2018-01-01
Series:International Journal of Genomics
Online Access:http://dx.doi.org/10.1155/2018/9784161
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832548569030066176
author Hao-Bo Guo
Yue Ma
Gerald A. Tuskan
Xiaohan Yang
Hong Guo
author_facet Hao-Bo Guo
Yue Ma
Gerald A. Tuskan
Xiaohan Yang
Hong Guo
author_sort Hao-Bo Guo
collection DOAJ
description The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.
format Article
id doaj-art-c5385a941d8b4aab80bd5d199c2df6a1
institution Kabale University
issn 2314-436X
2314-4378
language English
publishDate 2018-01-01
publisher Wiley
record_format Article
series International Journal of Genomics
spelling doaj-art-c5385a941d8b4aab80bd5d199c2df6a12025-02-03T06:13:48ZengWileyInternational Journal of Genomics2314-436X2314-43782018-01-01201810.1155/2018/97841619784161Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of ProteinsHao-Bo Guo0Yue Ma1Gerald A. Tuskan2Xiaohan Yang3Hong Guo4Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USADepartment of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USABiosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 3783, USABiosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 3783, USADepartment of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USAThe existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.http://dx.doi.org/10.1155/2018/9784161
spellingShingle Hao-Bo Guo
Yue Ma
Gerald A. Tuskan
Xiaohan Yang
Hong Guo
Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
International Journal of Genomics
title Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_full Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_fullStr Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_full_unstemmed Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_short Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_sort classification of complete proteomes of different organisms and protein sets based on their protein distributions in terms of some key attributes of proteins
url http://dx.doi.org/10.1155/2018/9784161
work_keys_str_mv AT haoboguo classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins
AT yuema classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins
AT geraldatuskan classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins
AT xiaohanyang classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins
AT hongguo classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins