Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems

The manuscript details the outcomes of a comprehensive study on the application of cluster-bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experimen...

Full description

Saved in:
Bibliographic Details
Main Authors: Sergii Babichev, Igor Liakh, Jiri Skvor
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10857291/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832540507902836736
author Sergii Babichev
Igor Liakh
Jiri Skvor
author_facet Sergii Babichev
Igor Liakh
Jiri Skvor
author_sort Sergii Babichev
collection DOAJ
description The manuscript details the outcomes of a comprehensive study on the application of cluster-bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experiments and mRNA sequencing. It outlines a conceptual framework and provides a block diagram of the stepwise procedure for analyzing gene expression data, aiming to enhance the accuracy and objectivity of disease diagnosis. The research methodology involves initial gene ontology analysis, followed by the application of the Self Organizing Tree Algorithm (SOTA) for clustering gene expression profiles, an ensemble algorithm for data biclustering, and CNN for sample classification. Bayesian optimization method was employed to determine the optimal hyperparameters for all models. The analysis of simulation results demonstrates the high efficacy of the proposed approach. Specifically, for Alzheimer’s data, the number of genes analyzed was reduced from 44,662 to 4,004. Subsequent cluster-bicluster analysis divided this data into two subsets containing 1,158 and 2,846 genes, respectively. Classification accuracy for samples within these subsets reached 89.8% and 91.8%. In cancer data analysis, the gene count was reduced from 60,660 to 10,422, with 3,955 and 6,467 genes in the first and second clusters, respectively. The classification accuracy for these subsets was 97.4% and 97.6%, respectively. To our mind, the implementation of this model promises to significantly improve the efficacy of early diagnosis systems for complex diseases.
format Article
id doaj-art-0da980f8076e4bd198ca97a01347b587
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-0da980f8076e4bd198ca97a01347b5872025-02-05T00:00:58ZengIEEEIEEE Access2169-35362025-01-0113212652127810.1109/ACCESS.2025.353599910857291Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis SystemsSergii Babichev0https://orcid.org/0000-0001-6797-1467Igor Liakh1https://orcid.org/0000-0001-5417-9403Jiri Skvor2Department of Physics, Kherson State University, Kherson, UkraineDepartment of Information Science, Physics and Mathematics Disciplines, Uzhhorod National University, Uzhhorod, UkraineDepartment of Informatics, Jan Evangelista Purkyne University in Usti nad Labem, Usti nad labem, Czech RepublicThe manuscript details the outcomes of a comprehensive study on the application of cluster-bicluster analysis, gene ontology analysis, and convolutional neural network (CNN) for diagnosing cancer and Alzheimer’s disease using gene expression data derived from both DNA microarray experiments and mRNA sequencing. It outlines a conceptual framework and provides a block diagram of the stepwise procedure for analyzing gene expression data, aiming to enhance the accuracy and objectivity of disease diagnosis. The research methodology involves initial gene ontology analysis, followed by the application of the Self Organizing Tree Algorithm (SOTA) for clustering gene expression profiles, an ensemble algorithm for data biclustering, and CNN for sample classification. Bayesian optimization method was employed to determine the optimal hyperparameters for all models. The analysis of simulation results demonstrates the high efficacy of the proposed approach. Specifically, for Alzheimer’s data, the number of genes analyzed was reduced from 44,662 to 4,004. Subsequent cluster-bicluster analysis divided this data into two subsets containing 1,158 and 2,846 genes, respectively. Classification accuracy for samples within these subsets reached 89.8% and 91.8%. In cancer data analysis, the gene count was reduced from 60,660 to 10,422, with 3,955 and 6,467 genes in the first and second clusters, respectively. The classification accuracy for these subsets was 97.4% and 97.6%, respectively. To our mind, the implementation of this model promises to significantly improve the efficacy of early diagnosis systems for complex diseases.https://ieeexplore.ieee.org/document/10857291/Gene expression datagene ontology analysisclusteringbiclusteringconvolutional neural networkBayes optimization
spellingShingle Sergii Babichev
Igor Liakh
Jiri Skvor
Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems
IEEE Access
Gene expression data
gene ontology analysis
clustering
biclustering
convolutional neural network
Bayes optimization
title Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems
title_full Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems
title_fullStr Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems
title_full_unstemmed Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems
title_short Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems
title_sort integrating data mining deep learning and gene ontology analysis for gene expression based disease diagnosis systems
topic Gene expression data
gene ontology analysis
clustering
biclustering
convolutional neural network
Bayes optimization
url https://ieeexplore.ieee.org/document/10857291/
work_keys_str_mv AT sergiibabichev integratingdataminingdeeplearningandgeneontologyanalysisforgeneexpressionbaseddiseasediagnosissystems
AT igorliakh integratingdataminingdeeplearningandgeneontologyanalysisforgeneexpressionbaseddiseasediagnosissystems
AT jiriskvor integratingdataminingdeeplearningandgeneontologyanalysisforgeneexpressionbaseddiseasediagnosissystems