AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]

Advancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. The Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a cloud-based platform for democratizing access to large-scale genomics da...

Full description

Saved in:
Bibliographic Details
Main Authors: Sean Davis, Marcel Ramos, Michael C. Schatz, Kai Gravel-Pucillo, Levi Waldron, Sehyun Oh, Martin Morgan, Vincent Carey
Format: Article
Language:English
Published: F1000 Research Ltd 2024-10-01
Series:F1000Research
Subjects:
Online Access:https://f1000research.com/articles/13-1257/v1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850124333989494784
author Sean Davis
Marcel Ramos
Michael C. Schatz
Kai Gravel-Pucillo
Levi Waldron
Sehyun Oh
Martin Morgan
Vincent Carey
author_facet Sean Davis
Marcel Ramos
Michael C. Schatz
Kai Gravel-Pucillo
Levi Waldron
Sehyun Oh
Martin Morgan
Vincent Carey
author_sort Sean Davis
collection DOAJ
description Advancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. The Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a cloud-based platform for democratizing access to large-scale genomics data and analysis tools. However, utilizing the full capabilities of AnVIL can be challenging for researchers without extensive bioinformatics expertise, especially for executing complex workflows. We present the AnVILWorkflow R package, which enables the convenient execution of bioinformatics workflows hosted on AnVIL directly from an R environment. AnVILWorkflow simplifies the setup of the cloud computing environment, input data formatting, workflow submission, and retrieval of results through intuitive functions. We demonstrate the utility of AnVILWorkflow for three use cases: bulk RNA-seq analysis with Salmon, metagenomics analysis with bioBakery, and digital pathology image processing with PathML. The key features of AnVILWorkflow include user-friendly browsing of available data and workflows, seamless integration of R and non-R tools within a reproducible analysis pipeline, and accessibility to scalable computing resources without direct management overhead. AnVILWorkflow lowers the barrier to utilizing AnVIL’s resources, especially for exploratory analyses or bulk processing with established workflows. This empowers a broader community of researchers to leverage the latest genomics tools and datasets using familiar R syntax. This package is distributed through the Bioconductor project (https://bioconductor.org/packages/AnVILWorkflow), and the source code is available through GitHub (https://github.com/shbrief/AnVILWorkflow).
format Article
id doaj-art-2a37f19a01324f0aae90b6956d83cab8
institution OA Journals
issn 2046-1402
language English
publishDate 2024-10-01
publisher F1000 Research Ltd
record_format Article
series F1000Research
spelling doaj-art-2a37f19a01324f0aae90b6956d83cab82025-08-20T02:34:20ZengF1000 Research LtdF1000Research2046-14022024-10-011310.12688/f1000research.155449.1170635AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]Sean Davis0Marcel Ramos1Michael C. Schatz2Kai Gravel-Pucillo3Levi Waldron4https://orcid.org/0000-0003-2725-0694Sehyun Oh5Martin Morgan6Vincent Carey7https://orcid.org/0000-0003-4046-0063Departments of Biomedical Informatics and Medicine,, University of Colorado Anschutz School of Medicine, Denver, Colorado, USAInstitute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USADepartment of Biology, Johns Hopkins University, Baltimore, Maryland, USAInstitute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USAInstitute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USAInstitute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USADepartment of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USAChanning Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USAAdvancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. The Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a cloud-based platform for democratizing access to large-scale genomics data and analysis tools. However, utilizing the full capabilities of AnVIL can be challenging for researchers without extensive bioinformatics expertise, especially for executing complex workflows. We present the AnVILWorkflow R package, which enables the convenient execution of bioinformatics workflows hosted on AnVIL directly from an R environment. AnVILWorkflow simplifies the setup of the cloud computing environment, input data formatting, workflow submission, and retrieval of results through intuitive functions. We demonstrate the utility of AnVILWorkflow for three use cases: bulk RNA-seq analysis with Salmon, metagenomics analysis with bioBakery, and digital pathology image processing with PathML. The key features of AnVILWorkflow include user-friendly browsing of available data and workflows, seamless integration of R and non-R tools within a reproducible analysis pipeline, and accessibility to scalable computing resources without direct management overhead. AnVILWorkflow lowers the barrier to utilizing AnVIL’s resources, especially for exploratory analyses or bulk processing with established workflows. This empowers a broader community of researchers to leverage the latest genomics tools and datasets using familiar R syntax. This package is distributed through the Bioconductor project (https://bioconductor.org/packages/AnVILWorkflow), and the source code is available through GitHub (https://github.com/shbrief/AnVILWorkflow).https://f1000research.com/articles/13-1257/v1Cloud computing Genomics Workflows R/Bioconductor AnVILeng
spellingShingle Sean Davis
Marcel Ramos
Michael C. Schatz
Kai Gravel-Pucillo
Levi Waldron
Sehyun Oh
Martin Morgan
Vincent Carey
AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]
F1000Research
Cloud computing
Genomics
Workflows
R/Bioconductor
AnVIL
eng
title AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]
title_full AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]
title_fullStr AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]
title_full_unstemmed AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]
title_short AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]
title_sort anvilworkflow a runnable workflow package for cloud implemented bioinformatics analysis pipelines version 1 peer review 2 approved
topic Cloud computing
Genomics
Workflows
R/Bioconductor
AnVIL
eng
url https://f1000research.com/articles/13-1257/v1
work_keys_str_mv AT seandavis anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved
AT marcelramos anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved
AT michaelcschatz anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved
AT kaigravelpucillo anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved
AT leviwaldron anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved
AT sehyunoh anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved
AT martinmorgan anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved
AT vincentcarey anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved