Robust ensemble of handcrafted and learned approaches for DNA-binding proteins
Purpose – Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP class...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Emerald Publishing
2025-01-01
|
Series: | Applied Computing and Informatics |
Subjects: | |
Online Access: | https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0051/full/pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832583456669827072 |
---|---|
author | Loris Nanni Sheryl Brahnam |
author_facet | Loris Nanni Sheryl Brahnam |
author_sort | Loris Nanni |
collection | DOAJ |
description | Purpose – Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks. Design/methodology/approach – Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system. Findings – The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system. Originality/value – Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field. |
format | Article |
id | doaj-art-1ff9d97cf24c4f88b0589bb491ccf62b |
institution | Kabale University |
issn | 2634-1964 2210-8327 |
language | English |
publishDate | 2025-01-01 |
publisher | Emerald Publishing |
record_format | Article |
series | Applied Computing and Informatics |
spelling | doaj-art-1ff9d97cf24c4f88b0589bb491ccf62b2025-01-28T12:19:18ZengEmerald PublishingApplied Computing and Informatics2634-19642210-83272025-01-01211/2375210.1108/ACI-03-2021-0051Robust ensemble of handcrafted and learned approaches for DNA-binding proteinsLoris Nanni0Sheryl Brahnam1University of Padua, Padua, ItalyInformation Technology and Cybersecurity, Missouri State University, Springfield, Missouri, USAPurpose – Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks. Design/methodology/approach – Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system. Findings – The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system. Originality/value – Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0051/full/pdfSupport vector machinesConvolutional neural networksPseudo amino acid compositionHeterogeneous ensemblesProtein representations |
spellingShingle | Loris Nanni Sheryl Brahnam Robust ensemble of handcrafted and learned approaches for DNA-binding proteins Applied Computing and Informatics Support vector machines Convolutional neural networks Pseudo amino acid composition Heterogeneous ensembles Protein representations |
title | Robust ensemble of handcrafted and learned approaches for DNA-binding proteins |
title_full | Robust ensemble of handcrafted and learned approaches for DNA-binding proteins |
title_fullStr | Robust ensemble of handcrafted and learned approaches for DNA-binding proteins |
title_full_unstemmed | Robust ensemble of handcrafted and learned approaches for DNA-binding proteins |
title_short | Robust ensemble of handcrafted and learned approaches for DNA-binding proteins |
title_sort | robust ensemble of handcrafted and learned approaches for dna binding proteins |
topic | Support vector machines Convolutional neural networks Pseudo amino acid composition Heterogeneous ensembles Protein representations |
url | https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0051/full/pdf |
work_keys_str_mv | AT lorisnanni robustensembleofhandcraftedandlearnedapproachesfordnabindingproteins AT sherylbrahnam robustensembleofhandcraftedandlearnedapproachesfordnabindingproteins |