Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflow

<p>Marine dinitrogen (<span class="inline-formula">N<sub>2</sub></span>) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. A...

Full description

Saved in:
Bibliographic Details
Main Authors: M. Morando, J. D. Magasin, S. Cheung, M. M. Mills, J. P. Zehr, K. A. Turk-Kubo
Format: Article
Language:English
Published: Copernicus Publications 2025-02-01
Series:Earth System Science Data
Online Access:https://essd.copernicus.org/articles/17/393/2025/essd-17-393-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832542199065083904
author M. Morando
J. D. Magasin
S. Cheung
S. Cheung
M. M. Mills
J. P. Zehr
K. A. Turk-Kubo
author_facet M. Morando
J. D. Magasin
S. Cheung
S. Cheung
M. M. Mills
J. P. Zehr
K. A. Turk-Kubo
author_sort M. Morando
collection DOAJ
description <p>Marine dinitrogen (<span class="inline-formula">N<sub>2</sub></span>) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. Although marine <span class="inline-formula">N<sub>2</sub></span> fixation is often ascribed to cyanobacterial diazotrophs, indirect evidence suggests that non-cyanobacterial diazotrophs (NCDs) might also be important. One widely used approach for understanding diazotroph diversity and biogeography is polymerase chain reaction (PCR) amplification of a portion of the <i>nifH</i> gene, which encodes a structural component of the <span class="inline-formula">N<sub>2</sub></span>-fixing enzyme complex, nitrogenase. An array of bioinformatic tools exists to process <i>nifH</i> amplicon data; however, the lack of standardized practices has hindered cross-study comparisons. This has led to a missed opportunity to more thoroughly assess diazotroph diversity and biogeography, as well as their potential contributions to the marine N cycle. To address these knowledge gaps, a bioinformatic workflow was designed that standardizes the processing of <i>nifH</i> amplicon datasets originating from high-throughput sequencing (HTS). Multiple datasets are efficiently and consistently processed with a specialized DADA2 pipeline to identify amplicon sequence variants (ASVs). A series of customizable post-pipeline stages then detect and discard spurious <i>nifH</i> sequences and annotate the subsequent quality-filtered <i>nifH</i> ASVs using multiple reference databases and classification approaches. This newly developed workflow was used to reprocess nearly all publicly available <i>nifH</i> amplicon HTS datasets from marine studies and to generate a comprehensive <i>nifH</i> ASV database containing 9383 ASVs aggregated from 21 studies that represent the diazotrophic populations in the global ocean. For each sample, the database includes physical and chemical metadata obtained from the Simons Collaborative Marine Atlas Project (CMAP). Here we demonstrate the utility of this database for revealing global biogeographical patterns of prominent diazotroph groups and highlight the influence of sea surface temperature. The workflow and <i>nifH</i> ASV database provide a robust framework for studying marine <span class="inline-formula">N<sub>2</sub></span> fixation and diazotrophic diversity captured by <i>nifH</i> amplicon HTS. Future datasets that target understudied ocean regions can be added easily, and users can tune parameters and studies included for their specific focus. The workflow and database are available, respectively, on GitHub (<span class="uri">https://github.com/jdmagasin/nifH-ASV-workflow</span>, last access: 21 January 2025; Morando et al., 2024c) and Figshare (<a href="https://doi.org/10.6084/m9.figshare.23795943.v2">https://doi.org/10.6084/m9.figshare.23795943.v2</a>; Morando et al., 2024b).</p>
format Article
id doaj-art-cfc45d5ac0404077836186dd2740413d
institution Kabale University
issn 1866-3508
1866-3516
language English
publishDate 2025-02-01
publisher Copernicus Publications
record_format Article
series Earth System Science Data
spelling doaj-art-cfc45d5ac0404077836186dd2740413d2025-02-04T08:25:27ZengCopernicus PublicationsEarth System Science Data1866-35081866-35162025-02-011739342210.5194/essd-17-393-2025Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflowM. Morando0J. D. Magasin1S. Cheung2S. Cheung3M. M. Mills4J. P. Zehr5K. A. Turk-Kubo6Ocean Sciences Department, University of California, Santa Cruz, Santa Cruz, CA 95064, United StatesOcean Sciences Department, University of California, Santa Cruz, Santa Cruz, CA 95064, United StatesOcean Sciences Department, University of California, Santa Cruz, Santa Cruz, CA 95064, United StatesInstitute of Marine Biology and Center of Excellence for the Oceans, National Taiwan Ocean University, Keelung 20224, TaiwanEarth System Science, Stanford University, Stanford, CA 94305, United StatesOcean Sciences Department, University of California, Santa Cruz, Santa Cruz, CA 95064, United StatesOcean Sciences Department, University of California, Santa Cruz, Santa Cruz, CA 95064, United States<p>Marine dinitrogen (<span class="inline-formula">N<sub>2</sub></span>) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. Although marine <span class="inline-formula">N<sub>2</sub></span> fixation is often ascribed to cyanobacterial diazotrophs, indirect evidence suggests that non-cyanobacterial diazotrophs (NCDs) might also be important. One widely used approach for understanding diazotroph diversity and biogeography is polymerase chain reaction (PCR) amplification of a portion of the <i>nifH</i> gene, which encodes a structural component of the <span class="inline-formula">N<sub>2</sub></span>-fixing enzyme complex, nitrogenase. An array of bioinformatic tools exists to process <i>nifH</i> amplicon data; however, the lack of standardized practices has hindered cross-study comparisons. This has led to a missed opportunity to more thoroughly assess diazotroph diversity and biogeography, as well as their potential contributions to the marine N cycle. To address these knowledge gaps, a bioinformatic workflow was designed that standardizes the processing of <i>nifH</i> amplicon datasets originating from high-throughput sequencing (HTS). Multiple datasets are efficiently and consistently processed with a specialized DADA2 pipeline to identify amplicon sequence variants (ASVs). A series of customizable post-pipeline stages then detect and discard spurious <i>nifH</i> sequences and annotate the subsequent quality-filtered <i>nifH</i> ASVs using multiple reference databases and classification approaches. This newly developed workflow was used to reprocess nearly all publicly available <i>nifH</i> amplicon HTS datasets from marine studies and to generate a comprehensive <i>nifH</i> ASV database containing 9383 ASVs aggregated from 21 studies that represent the diazotrophic populations in the global ocean. For each sample, the database includes physical and chemical metadata obtained from the Simons Collaborative Marine Atlas Project (CMAP). Here we demonstrate the utility of this database for revealing global biogeographical patterns of prominent diazotroph groups and highlight the influence of sea surface temperature. The workflow and <i>nifH</i> ASV database provide a robust framework for studying marine <span class="inline-formula">N<sub>2</sub></span> fixation and diazotrophic diversity captured by <i>nifH</i> amplicon HTS. Future datasets that target understudied ocean regions can be added easily, and users can tune parameters and studies included for their specific focus. The workflow and database are available, respectively, on GitHub (<span class="uri">https://github.com/jdmagasin/nifH-ASV-workflow</span>, last access: 21 January 2025; Morando et al., 2024c) and Figshare (<a href="https://doi.org/10.6084/m9.figshare.23795943.v2">https://doi.org/10.6084/m9.figshare.23795943.v2</a>; Morando et al., 2024b).</p>https://essd.copernicus.org/articles/17/393/2025/essd-17-393-2025.pdf
spellingShingle M. Morando
J. D. Magasin
S. Cheung
S. Cheung
M. M. Mills
J. P. Zehr
K. A. Turk-Kubo
Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflow
Earth System Science Data
title Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflow
title_full Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflow
title_fullStr Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflow
title_full_unstemmed Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflow
title_short Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflow
title_sort global biogeography of n sub 2 sub fixing microbes i nifh i amplicon database and analytics workflow
url https://essd.copernicus.org/articles/17/393/2025/essd-17-393-2025.pdf
work_keys_str_mv AT mmorando globalbiogeographyofnsub2subfixingmicrobesinifhiamplicondatabaseandanalyticsworkflow
AT jdmagasin globalbiogeographyofnsub2subfixingmicrobesinifhiamplicondatabaseandanalyticsworkflow
AT scheung globalbiogeographyofnsub2subfixingmicrobesinifhiamplicondatabaseandanalyticsworkflow
AT scheung globalbiogeographyofnsub2subfixingmicrobesinifhiamplicondatabaseandanalyticsworkflow
AT mmmills globalbiogeographyofnsub2subfixingmicrobesinifhiamplicondatabaseandanalyticsworkflow
AT jpzehr globalbiogeographyofnsub2subfixingmicrobesinifhiamplicondatabaseandanalyticsworkflow
AT katurkkubo globalbiogeographyofnsub2subfixingmicrobesinifhiamplicondatabaseandanalyticsworkflow