DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features

In the domain of genome annotation, the identification of DNA-binding protein is one of the crucial challenges. DNA is considered a blueprint for the cell. It contained all necessary information for building and maintaining the trait of an organism. It is DNA, which makes a living thing, a living th...

Full description

Saved in:
Bibliographic Details
Main Authors: Omar Barukab, Yaser Daanial Khan, Sher Afzal Khan, Kuo-Chen Chou
Format: Article
Language:English
Published: Wiley 2022-01-01
Series:Applied Bionics and Biomechanics
Online Access:http://dx.doi.org/10.1155/2022/5483115
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832562207662014464
author Omar Barukab
Yaser Daanial Khan
Sher Afzal Khan
Kuo-Chen Chou
author_facet Omar Barukab
Yaser Daanial Khan
Sher Afzal Khan
Kuo-Chen Chou
author_sort Omar Barukab
collection DOAJ
description In the domain of genome annotation, the identification of DNA-binding protein is one of the crucial challenges. DNA is considered a blueprint for the cell. It contained all necessary information for building and maintaining the trait of an organism. It is DNA, which makes a living thing, a living thing. Protein interaction with DNA performs an essential role in regulating DNA functions such as DNA repair, transcription, and regulation. Identification of these proteins is a crucial task for understanding the regulation of genes. Several methods have been developed to identify the binding sites of DNA and protein depending upon the structures and sequences, but they were costly and time-consuming. Therefore, we propose a methodology named “DNAPred_Prot”, which uses various position and frequency-dependent features from protein sequences for efficient and effective prediction of DNA-binding proteins. Using testing techniques like 10-fold cross-validation and jackknife testing an accuracy of 94.95% and 95.11% was yielded, respectively. The results of SVM and ANN were also compared with those of a random forest classifier. The robustness of the proposed model was evaluated by using the independent dataset PDB186, and an accuracy of 91.47% was achieved by it. From these results, it can be predicted that the suggested methodology performs better than other extant methods for the identification of DNA-binding proteins.
format Article
id doaj-art-9f277db93e004dd0b86985652d3cc1fa
institution Kabale University
issn 1754-2103
language English
publishDate 2022-01-01
publisher Wiley
record_format Article
series Applied Bionics and Biomechanics
spelling doaj-art-9f277db93e004dd0b86985652d3cc1fa2025-02-03T01:23:14ZengWileyApplied Bionics and Biomechanics1754-21032022-01-01202210.1155/2022/5483115DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based FeaturesOmar Barukab0Yaser Daanial Khan1Sher Afzal Khan2Kuo-Chen Chou3Department of Information TechnologyDepartment of Computer ScienceDepartment of Computer SciencesGordon Life Science InstituteIn the domain of genome annotation, the identification of DNA-binding protein is one of the crucial challenges. DNA is considered a blueprint for the cell. It contained all necessary information for building and maintaining the trait of an organism. It is DNA, which makes a living thing, a living thing. Protein interaction with DNA performs an essential role in regulating DNA functions such as DNA repair, transcription, and regulation. Identification of these proteins is a crucial task for understanding the regulation of genes. Several methods have been developed to identify the binding sites of DNA and protein depending upon the structures and sequences, but they were costly and time-consuming. Therefore, we propose a methodology named “DNAPred_Prot”, which uses various position and frequency-dependent features from protein sequences for efficient and effective prediction of DNA-binding proteins. Using testing techniques like 10-fold cross-validation and jackknife testing an accuracy of 94.95% and 95.11% was yielded, respectively. The results of SVM and ANN were also compared with those of a random forest classifier. The robustness of the proposed model was evaluated by using the independent dataset PDB186, and an accuracy of 91.47% was achieved by it. From these results, it can be predicted that the suggested methodology performs better than other extant methods for the identification of DNA-binding proteins.http://dx.doi.org/10.1155/2022/5483115
spellingShingle Omar Barukab
Yaser Daanial Khan
Sher Afzal Khan
Kuo-Chen Chou
DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features
Applied Bionics and Biomechanics
title DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features
title_full DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features
title_fullStr DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features
title_full_unstemmed DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features
title_short DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features
title_sort dnapred prot identification of dna binding proteins using composition and position based features
url http://dx.doi.org/10.1155/2022/5483115
work_keys_str_mv AT omarbarukab dnapredprotidentificationofdnabindingproteinsusingcompositionandpositionbasedfeatures
AT yaserdaanialkhan dnapredprotidentificationofdnabindingproteinsusingcompositionandpositionbasedfeatures
AT sherafzalkhan dnapredprotidentificationofdnabindingproteinsusingcompositionandpositionbasedfeatures
AT kuochenchou dnapredprotidentificationofdnabindingproteinsusingcompositionandpositionbasedfeatures