Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems
The manuscript details the outcomes of a comprehensive study on the application of cluster-bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experimen...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10857291/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832540507902836736 |
---|---|
author | Sergii Babichev Igor Liakh Jiri Skvor |
author_facet | Sergii Babichev Igor Liakh Jiri Skvor |
author_sort | Sergii Babichev |
collection | DOAJ |
description | The manuscript details the outcomes of a comprehensive study on the application of cluster-bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experiments and mRNA sequencing. It outlines a conceptual framework and provides a block diagram of the stepwise procedure for analyzing gene expression data, aiming to enhance the accuracy and objectivity of disease diagnosis. The research methodology involves initial gene ontology analysis, followed by the application of the Self Organizing Tree Algorithm (SOTA) for clustering gene expression profiles, an ensemble algorithm for data biclustering, and CNN for sample classification. Bayesian optimization method was employed to determine the optimal hyperparameters for all models. The analysis of simulation results demonstrates the high efficacy of the proposed approach. Specifically, for Alzheimer’s data, the number of genes analyzed was reduced from 44,662 to 4,004. Subsequent cluster-bicluster analysis divided this data into two subsets containing 1,158 and 2,846 genes, respectively. Classification accuracy for samples within these subsets reached 89.8% and 91.8%. In cancer data analysis, the gene count was reduced from 60,660 to 10,422, with 3,955 and 6,467 genes in the first and second clusters, respectively. The classification accuracy for these subsets was 97.4% and 97.6%, respectively. To our mind, the implementation of this model promises to significantly improve the efficacy of early diagnosis systems for complex diseases. |
format | Article |
id | doaj-art-0da980f8076e4bd198ca97a01347b587 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-0da980f8076e4bd198ca97a01347b5872025-02-05T00:00:58ZengIEEEIEEE Access2169-35362025-01-0113212652127810.1109/ACCESS.2025.353599910857291Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis SystemsSergii Babichev0https://orcid.org/0000-0001-6797-1467Igor Liakh1https://orcid.org/0000-0001-5417-9403Jiri Skvor2Department of Physics, Kherson State University, Kherson, UkraineDepartment of Information Science, Physics and Mathematics Disciplines, Uzhhorod National University, Uzhhorod, UkraineDepartment of Informatics, Jan Evangelista Purkyne University in Usti nad Labem, Usti nad labem, Czech RepublicThe manuscript details the outcomes of a comprehensive study on the application of cluster-bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experiments and mRNA sequencing. It outlines a conceptual framework and provides a block diagram of the stepwise procedure for analyzing gene expression data, aiming to enhance the accuracy and objectivity of disease diagnosis. The research methodology involves initial gene ontology analysis, followed by the application of the Self Organizing Tree Algorithm (SOTA) for clustering gene expression profiles, an ensemble algorithm for data biclustering, and CNN for sample classification. Bayesian optimization method was employed to determine the optimal hyperparameters for all models. The analysis of simulation results demonstrates the high efficacy of the proposed approach. Specifically, for Alzheimer’s data, the number of genes analyzed was reduced from 44,662 to 4,004. Subsequent cluster-bicluster analysis divided this data into two subsets containing 1,158 and 2,846 genes, respectively. Classification accuracy for samples within these subsets reached 89.8% and 91.8%. In cancer data analysis, the gene count was reduced from 60,660 to 10,422, with 3,955 and 6,467 genes in the first and second clusters, respectively. The classification accuracy for these subsets was 97.4% and 97.6%, respectively. To our mind, the implementation of this model promises to significantly improve the efficacy of early diagnosis systems for complex diseases.https://ieeexplore.ieee.org/document/10857291/Gene expression datagene ontology analysisclusteringbiclusteringconvolutional neural networkBayes optimization |
spellingShingle | Sergii Babichev Igor Liakh Jiri Skvor Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems IEEE Access Gene expression data gene ontology analysis clustering biclustering convolutional neural network Bayes optimization |
title | Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems |
title_full | Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems |
title_fullStr | Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems |
title_full_unstemmed | Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems |
title_short | Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems |
title_sort | integrating data mining deep learning and gene ontology analysis for gene expression based disease diagnosis systems |
topic | Gene expression data gene ontology analysis clustering biclustering convolutional neural network Bayes optimization |
url | https://ieeexplore.ieee.org/document/10857291/ |
work_keys_str_mv | AT sergiibabichev integratingdataminingdeeplearningandgeneontologyanalysisforgeneexpressionbaseddiseasediagnosissystems AT igorliakh integratingdataminingdeeplearningandgeneontologyanalysisforgeneexpressionbaseddiseasediagnosissystems AT jiriskvor integratingdataminingdeeplearningandgeneontologyanalysisforgeneexpressionbaseddiseasediagnosissystems |