Robust ensemble of handcrafted and learned approaches for DNA-binding proteins

Purpose – Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP class...

Full description

Saved in:
Bibliographic Details
Main Authors: Loris Nanni, Sheryl Brahnam
Format: Article
Language:English
Published: Emerald Publishing 2025-01-01
Series:Applied Computing and Informatics
Subjects:
Online Access:https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0051/full/pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832583456669827072
author Loris Nanni
Sheryl Brahnam
author_facet Loris Nanni
Sheryl Brahnam
author_sort Loris Nanni
collection DOAJ
description Purpose – Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks. Design/methodology/approach – Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system. Findings – The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system. Originality/value – Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.
format Article
id doaj-art-1ff9d97cf24c4f88b0589bb491ccf62b
institution Kabale University
issn 2634-1964
2210-8327
language English
publishDate 2025-01-01
publisher Emerald Publishing
record_format Article
series Applied Computing and Informatics
spelling doaj-art-1ff9d97cf24c4f88b0589bb491ccf62b2025-01-28T12:19:18ZengEmerald PublishingApplied Computing and Informatics2634-19642210-83272025-01-01211/2375210.1108/ACI-03-2021-0051Robust ensemble of handcrafted and learned approaches for DNA-binding proteinsLoris Nanni0Sheryl Brahnam1University of Padua, Padua, ItalyInformation Technology and Cybersecurity, Missouri State University, Springfield, Missouri, USAPurpose – Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks. Design/methodology/approach – Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system. Findings – The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system. Originality/value – Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0051/full/pdfSupport vector machinesConvolutional neural networksPseudo amino acid compositionHeterogeneous ensemblesProtein representations
spellingShingle Loris Nanni
Sheryl Brahnam
Robust ensemble of handcrafted and learned approaches for DNA-binding proteins
Applied Computing and Informatics
Support vector machines
Convolutional neural networks
Pseudo amino acid composition
Heterogeneous ensembles
Protein representations
title Robust ensemble of handcrafted and learned approaches for DNA-binding proteins
title_full Robust ensemble of handcrafted and learned approaches for DNA-binding proteins
title_fullStr Robust ensemble of handcrafted and learned approaches for DNA-binding proteins
title_full_unstemmed Robust ensemble of handcrafted and learned approaches for DNA-binding proteins
title_short Robust ensemble of handcrafted and learned approaches for DNA-binding proteins
title_sort robust ensemble of handcrafted and learned approaches for dna binding proteins
topic Support vector machines
Convolutional neural networks
Pseudo amino acid composition
Heterogeneous ensembles
Protein representations
url https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0051/full/pdf
work_keys_str_mv AT lorisnanni robustensembleofhandcraftedandlearnedapproachesfordnabindingproteins
AT sherylbrahnam robustensembleofhandcraftedandlearnedapproachesfordnabindingproteins