Use of Graph Database for the Integration of Heterogeneous Biological Data

Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join s...

Full description

Saved in:

Bibliographic Details
Main Authors:	Byoung-Ha Yoon, Seon-Kyu Kim, Seon-Young Kim
Format:	Article
Language:	English
Published:	BioMed Central 2017-03-01
Series:	Genomics & Informatics
Subjects:	biological network data mining graph database heterogeneous biological data Neo4j query performance
Online Access:	http://genominfo.org/upload/pdf/gni-15-19.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832570091617648640
author	Byoung-Ha Yoon Seon-Kyu Kim Seon-Young Kim
author_facet	Byoung-Ha Yoon Seon-Kyu Kim Seon-Young Kim
author_sort	Byoung-Ha Yoon
collection	DOAJ
description	Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
format	Article
id	doaj-art-b1b0d931fe4048cc9bce760b765e3f68
institution	Kabale University
issn	1598-866X 2234-0742
language	English
publishDate	2017-03-01
publisher	BioMed Central
record_format	Article
series	Genomics & Informatics
spelling	doaj-art-b1b0d931fe4048cc9bce760b765e3f682025-02-02T17:25:26ZengBioMed CentralGenomics & Informatics1598-866X2234-07422017-03-01151192710.5808/GI.2017.15.1.19202Use of Graph Database for the Integration of Heterogeneous Biological DataByoung-Ha Yoon0Seon-Kyu Kim1Seon-Young Kim2Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea.Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea.Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea.Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.http://genominfo.org/upload/pdf/gni-15-19.pdfbiological networkdata mininggraph databaseheterogeneous biological dataNeo4jquery performance
spellingShingle	Byoung-Ha Yoon Seon-Kyu Kim Seon-Young Kim Use of Graph Database for the Integration of Heterogeneous Biological Data Genomics & Informatics biological network data mining graph database heterogeneous biological data Neo4j query performance
title	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_full	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_fullStr	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_full_unstemmed	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_short	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_sort	use of graph database for the integration of heterogeneous biological data
topic	biological network data mining graph database heterogeneous biological data Neo4j query performance
url	http://genominfo.org/upload/pdf/gni-15-19.pdf
work_keys_str_mv	AT byounghayoon useofgraphdatabasefortheintegrationofheterogeneousbiologicaldata AT seonkyukim useofgraphdatabasefortheintegrationofheterogeneousbiologicaldata AT seonyoungkim useofgraphdatabasefortheintegrationofheterogeneousbiologicaldata

Use of Graph Database for the Integration of Heterogeneous Biological Data

Similar Items