Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa

Biologists often set out to find relevant data in an ever-changing landscape of interesting databases. While leading journals publish descriptions of databases, they are usually not recent and do not frequently update the list that discards defunct or poor-quality databases. These indices usually in...

Full description

Saved in:
Bibliographic Details
Main Authors: Kallol Naha, Hasan M. Jamil
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/2/621
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589257622945792
author Kallol Naha
Hasan M. Jamil
author_facet Kallol Naha
Hasan M. Jamil
author_sort Kallol Naha
collection DOAJ
description Biologists often set out to find relevant data in an ever-changing landscape of interesting databases. While leading journals publish descriptions of databases, they are usually not recent and do not frequently update the list that discards defunct or poor-quality databases. These indices usually include databases that are proactively requested to be included by their authors. The challenge for individual biologists, then, is to discover, explore, and select databases of interest from a large unorganized collection and effectively use them in their analysis without too large of an investment. The advocation of the FAIR data principle to improve searching, finding, accessing, and inter-operating among these diverse information sources in order to increase usability is proving to be a difficult proposition and consequently, a large number of data sources are not FAIR-compliant. Since linked open data do not guarantee FAIRness, biologists are now left to individually search for information in open networks. In this paper, we propose <i>SoDa</i>, for intelligent data foraging on the internet by biologists. SoDa helps biologists to discover resources based on analysis requirements and generate resource access plans, as well as storing cleaned data and knowledge for community use. SoDa includes a natural language-powered resource discovery tool, a tool to retrieve data from remote databases, organize and store collected data, query stored data, and seek help from the community when things do not work as anticipated. A secondary search index is also supported for community members to find archived information in a convenient way to enable its reuse. The features supported in SoDa endows biologists with data integration capabilities over arbitrary linked open databases and construct powerful computational pipelines using them, capabilities that are not supported in most contemporary biological workflow systems, such as Taverna or Galaxy.
format Article
id doaj-art-5271ef8ec921494fa92967499cacd1f0
institution Kabale University
issn 2076-3417
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-5271ef8ec921494fa92967499cacd1f02025-01-24T13:20:08ZengMDPI AGApplied Sciences2076-34172025-01-0115262110.3390/app15020621Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDaKallol Naha0Hasan M. Jamil1Department of Computer Science, University of Idaho, 875 Perimeter Drive, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, 875 Perimeter Drive, Moscow, ID 83844, USABiologists often set out to find relevant data in an ever-changing landscape of interesting databases. While leading journals publish descriptions of databases, they are usually not recent and do not frequently update the list that discards defunct or poor-quality databases. These indices usually include databases that are proactively requested to be included by their authors. The challenge for individual biologists, then, is to discover, explore, and select databases of interest from a large unorganized collection and effectively use them in their analysis without too large of an investment. The advocation of the FAIR data principle to improve searching, finding, accessing, and inter-operating among these diverse information sources in order to increase usability is proving to be a difficult proposition and consequently, a large number of data sources are not FAIR-compliant. Since linked open data do not guarantee FAIRness, biologists are now left to individually search for information in open networks. In this paper, we propose <i>SoDa</i>, for intelligent data foraging on the internet by biologists. SoDa helps biologists to discover resources based on analysis requirements and generate resource access plans, as well as storing cleaned data and knowledge for community use. SoDa includes a natural language-powered resource discovery tool, a tool to retrieve data from remote databases, organize and store collected data, query stored data, and seek help from the community when things do not work as anticipated. A secondary search index is also supported for community members to find archived information in a convenient way to enable its reuse. The features supported in SoDa endows biologists with data integration capabilities over arbitrary linked open databases and construct powerful computational pipelines using them, capabilities that are not supported in most contemporary biological workflow systems, such as Taverna or Galaxy.https://www.mdpi.com/2076-3417/15/2/621large language modelintelligent user interfaceFAIRwrapper generationinteroperabilityecosystem
spellingShingle Kallol Naha
Hasan M. Jamil
Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa
Applied Sciences
large language model
intelligent user interface
FAIR
wrapper generation
interoperability
ecosystem
title Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa
title_full Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa
title_fullStr Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa
title_full_unstemmed Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa
title_short Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa
title_sort ad hoc data foraging in a life sciences community ecosystem using soda
topic large language model
intelligent user interface
FAIR
wrapper generation
interoperability
ecosystem
url https://www.mdpi.com/2076-3417/15/2/621
work_keys_str_mv AT kallolnaha adhocdataforaginginalifesciencescommunityecosystemusingsoda
AT hasanmjamil adhocdataforaginginalifesciencescommunityecosystemusingsoda