AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]
Advancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. The Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a cloud-based platform for democratizing access to large-scale genomics da...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
F1000 Research Ltd
2024-10-01
|
| Series: | F1000Research |
| Subjects: | |
| Online Access: | https://f1000research.com/articles/13-1257/v1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850124333989494784 |
|---|---|
| author | Sean Davis Marcel Ramos Michael C. Schatz Kai Gravel-Pucillo Levi Waldron Sehyun Oh Martin Morgan Vincent Carey |
| author_facet | Sean Davis Marcel Ramos Michael C. Schatz Kai Gravel-Pucillo Levi Waldron Sehyun Oh Martin Morgan Vincent Carey |
| author_sort | Sean Davis |
| collection | DOAJ |
| description | Advancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. The Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a cloud-based platform for democratizing access to large-scale genomics data and analysis tools. However, utilizing the full capabilities of AnVIL can be challenging for researchers without extensive bioinformatics expertise, especially for executing complex workflows. We present the AnVILWorkflow R package, which enables the convenient execution of bioinformatics workflows hosted on AnVIL directly from an R environment. AnVILWorkflow simplifies the setup of the cloud computing environment, input data formatting, workflow submission, and retrieval of results through intuitive functions. We demonstrate the utility of AnVILWorkflow for three use cases: bulk RNA-seq analysis with Salmon, metagenomics analysis with bioBakery, and digital pathology image processing with PathML. The key features of AnVILWorkflow include user-friendly browsing of available data and workflows, seamless integration of R and non-R tools within a reproducible analysis pipeline, and accessibility to scalable computing resources without direct management overhead. AnVILWorkflow lowers the barrier to utilizing AnVIL’s resources, especially for exploratory analyses or bulk processing with established workflows. This empowers a broader community of researchers to leverage the latest genomics tools and datasets using familiar R syntax. This package is distributed through the Bioconductor project (https://bioconductor.org/packages/AnVILWorkflow), and the source code is available through GitHub (https://github.com/shbrief/AnVILWorkflow). |
| format | Article |
| id | doaj-art-2a37f19a01324f0aae90b6956d83cab8 |
| institution | OA Journals |
| issn | 2046-1402 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | F1000 Research Ltd |
| record_format | Article |
| series | F1000Research |
| spelling | doaj-art-2a37f19a01324f0aae90b6956d83cab82025-08-20T02:34:20ZengF1000 Research LtdF1000Research2046-14022024-10-011310.12688/f1000research.155449.1170635AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved]Sean Davis0Marcel Ramos1Michael C. Schatz2Kai Gravel-Pucillo3Levi Waldron4https://orcid.org/0000-0003-2725-0694Sehyun Oh5Martin Morgan6Vincent Carey7https://orcid.org/0000-0003-4046-0063Departments of Biomedical Informatics and Medicine,, University of Colorado Anschutz School of Medicine, Denver, Colorado, USAInstitute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USADepartment of Biology, Johns Hopkins University, Baltimore, Maryland, USAInstitute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USAInstitute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USAInstitute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USADepartment of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USAChanning Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USAAdvancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. The Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a cloud-based platform for democratizing access to large-scale genomics data and analysis tools. However, utilizing the full capabilities of AnVIL can be challenging for researchers without extensive bioinformatics expertise, especially for executing complex workflows. We present the AnVILWorkflow R package, which enables the convenient execution of bioinformatics workflows hosted on AnVIL directly from an R environment. AnVILWorkflow simplifies the setup of the cloud computing environment, input data formatting, workflow submission, and retrieval of results through intuitive functions. We demonstrate the utility of AnVILWorkflow for three use cases: bulk RNA-seq analysis with Salmon, metagenomics analysis with bioBakery, and digital pathology image processing with PathML. The key features of AnVILWorkflow include user-friendly browsing of available data and workflows, seamless integration of R and non-R tools within a reproducible analysis pipeline, and accessibility to scalable computing resources without direct management overhead. AnVILWorkflow lowers the barrier to utilizing AnVIL’s resources, especially for exploratory analyses or bulk processing with established workflows. This empowers a broader community of researchers to leverage the latest genomics tools and datasets using familiar R syntax. This package is distributed through the Bioconductor project (https://bioconductor.org/packages/AnVILWorkflow), and the source code is available through GitHub (https://github.com/shbrief/AnVILWorkflow).https://f1000research.com/articles/13-1257/v1Cloud computing Genomics Workflows R/Bioconductor AnVILeng |
| spellingShingle | Sean Davis Marcel Ramos Michael C. Schatz Kai Gravel-Pucillo Levi Waldron Sehyun Oh Martin Morgan Vincent Carey AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved] F1000Research Cloud computing Genomics Workflows R/Bioconductor AnVIL eng |
| title | AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved] |
| title_full | AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved] |
| title_fullStr | AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved] |
| title_full_unstemmed | AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved] |
| title_short | AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: 2 approved] |
| title_sort | anvilworkflow a runnable workflow package for cloud implemented bioinformatics analysis pipelines version 1 peer review 2 approved |
| topic | Cloud computing Genomics Workflows R/Bioconductor AnVIL eng |
| url | https://f1000research.com/articles/13-1257/v1 |
| work_keys_str_mv | AT seandavis anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved AT marcelramos anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved AT michaelcschatz anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved AT kaigravelpucillo anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved AT leviwaldron anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved AT sehyunoh anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved AT martinmorgan anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved AT vincentcarey anvilworkflowarunnableworkflowpackageforcloudimplementedbioinformaticsanalysispipelinesversion1peerreview2approved |