Ontologies in modelling and analysing of big genetic data

To systematize and effectively use the huge volume of experimental data accumulated in the field of bioinformatics and biomedicine, new approaches based on ontologies are needed, including automated methods for semantic integration of heterogeneous experimental data, methods for creating large knowl...

Full description

Saved in:
Bibliographic Details
Main Authors: N. L. Podkolodnyy, O. A. Podkolodnaya, V. A. Ivanisenko, M. A. Marchenko
Format: Article
Language:English
Published: Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders 2025-01-01
Series:Вавиловский журнал генетики и селекции
Subjects:
Online Access:https://vavilov.elpub.ru/jour/article/view/4415
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575050437361664
author N. L. Podkolodnyy
O. A. Podkolodnaya
V. A. Ivanisenko
M. A. Marchenko
author_facet N. L. Podkolodnyy
O. A. Podkolodnaya
V. A. Ivanisenko
M. A. Marchenko
author_sort N. L. Podkolodnyy
collection DOAJ
description To systematize and effectively use the huge volume of experimental data accumulated in the field of bioinformatics and biomedicine, new approaches based on ontologies are needed, including automated methods for semantic integration of heterogeneous experimental data, methods for creating large knowledge bases and self-interpreting methods for analyzing large heterogeneous data based on deep learning. The article briefly presents the features of the subject area (bioinformatics, systems biology, biomedicine), formal definitions of the concept of ontology and knowledge graphs, as well as examples of using ontologies for semantic integration of heterogeneous data and creating large knowledge bases, as well as interpreting the results of deep learning on big data. As an example of a successful project, the Gene Ontology knowledge base is described, which not only includes terminological knowledge and gene ontology annotations (GOA), but also causal influence models (GO-CAM). This makes it useful not only for genomic biology, but also for systems biology, as well as for interpreting large-scale experimental data. An approach to building large ontologies using design patterns is discussed, using the ontology of biological attributes (OBA) as an example. Here, most of the classification is automatically computed based on previously created reference ontologies using automated inference, except for a small number of high-level concepts. One of the main problems of deep learning is the lack of interpretability, since neural networks often function as “black boxes” unable to explain their decisions. This paper describes approaches to creating methods for interpreting deep learning models and presents two examples of self-explanatory ontology-based deep learning models: (1) Deep GONet, which integrates Gene Ontology into a hierarchical neural network architecture, where each neuron represents a biological function. Experiments on cancer diagnostic datasets show that Deep GONet is easily interpretable and has high performance in distinguishing cancerous and non-cancerous samples. (2) ONN4MST, which uses biome ontologies to trace microbial sources of samples whose niches were previously poorly studied or unknown, detecting microbial contaminants. ONN4MST can distinguish samples from ontologically similar biomes, thus offering a quantitative way to characterize the evolution of the human gut microbial community. Both examples demonstrate high performance and interpretability, making them valuable tools for analyzing and interpreting big data in biology.
format Article
id doaj-art-dce063393f4240f9944314fe4a449c9e
institution Kabale University
issn 2500-3259
language English
publishDate 2025-01-01
publisher Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders
record_format Article
series Вавиловский журнал генетики и селекции
spelling doaj-art-dce063393f4240f9944314fe4a449c9e2025-02-01T09:58:14ZengSiberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and BreedersВавиловский журнал генетики и селекции2500-32592025-01-0128894094910.18699/vjgb-24-1011528Ontologies in modelling and analysing of big genetic dataN. L. Podkolodnyy0O. A. Podkolodnaya1V. A. Ivanisenko2M. A. Marchenko3Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Institute of Computational Mathematics and Mathematical Geophysics of the Siberian Branch of the Russian Academy of Sciences; Novosibirsk State University; Kurchatov Genomic Center of ICG SB RASInstitute of Cytology and Genetics of the Siberian Branch of the Russian Academy of SciencesInstitute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Kurchatov Genomic Center of ICG SB RASInstitute of Computational Mathematics and Mathematical Geophysics of the Siberian Branch of the Russian Academy of Sciences; Novosibirsk State UniversityTo systematize and effectively use the huge volume of experimental data accumulated in the field of bioinformatics and biomedicine, new approaches based on ontologies are needed, including automated methods for semantic integration of heterogeneous experimental data, methods for creating large knowledge bases and self-interpreting methods for analyzing large heterogeneous data based on deep learning. The article briefly presents the features of the subject area (bioinformatics, systems biology, biomedicine), formal definitions of the concept of ontology and knowledge graphs, as well as examples of using ontologies for semantic integration of heterogeneous data and creating large knowledge bases, as well as interpreting the results of deep learning on big data. As an example of a successful project, the Gene Ontology knowledge base is described, which not only includes terminological knowledge and gene ontology annotations (GOA), but also causal influence models (GO-CAM). This makes it useful not only for genomic biology, but also for systems biology, as well as for interpreting large-scale experimental data. An approach to building large ontologies using design patterns is discussed, using the ontology of biological attributes (OBA) as an example. Here, most of the classification is automatically computed based on previously created reference ontologies using automated inference, except for a small number of high-level concepts. One of the main problems of deep learning is the lack of interpretability, since neural networks often function as “black boxes” unable to explain their decisions. This paper describes approaches to creating methods for interpreting deep learning models and presents two examples of self-explanatory ontology-based deep learning models: (1) Deep GONet, which integrates Gene Ontology into a hierarchical neural network architecture, where each neuron represents a biological function. Experiments on cancer diagnostic datasets show that Deep GONet is easily interpretable and has high performance in distinguishing cancerous and non-cancerous samples. (2) ONN4MST, which uses biome ontologies to trace microbial sources of samples whose niches were previously poorly studied or unknown, detecting microbial contaminants. ONN4MST can distinguish samples from ontologically similar biomes, thus offering a quantitative way to characterize the evolution of the human gut microbial community. Both examples demonstrate high performance and interpretability, making them valuable tools for analyzing and interpreting big data in biology.https://vavilov.elpub.ru/jour/article/view/4415ontologiesbig data analysisbioinformaticssystems biologydeep learninginterpretability
spellingShingle N. L. Podkolodnyy
O. A. Podkolodnaya
V. A. Ivanisenko
M. A. Marchenko
Ontologies in modelling and analysing of big genetic data
Вавиловский журнал генетики и селекции
ontologies
big data analysis
bioinformatics
systems biology
deep learning
interpretability
title Ontologies in modelling and analysing of big genetic data
title_full Ontologies in modelling and analysing of big genetic data
title_fullStr Ontologies in modelling and analysing of big genetic data
title_full_unstemmed Ontologies in modelling and analysing of big genetic data
title_short Ontologies in modelling and analysing of big genetic data
title_sort ontologies in modelling and analysing of big genetic data
topic ontologies
big data analysis
bioinformatics
systems biology
deep learning
interpretability
url https://vavilov.elpub.ru/jour/article/view/4415
work_keys_str_mv AT nlpodkolodnyy ontologiesinmodellingandanalysingofbiggeneticdata
AT oapodkolodnaya ontologiesinmodellingandanalysingofbiggeneticdata
AT vaivanisenko ontologiesinmodellingandanalysingofbiggeneticdata
AT mamarchenko ontologiesinmodellingandanalysingofbiggeneticdata