Global biogeography of N<sub>2</sub>-fixing microbes: <i>nifH</i> amplicon database and analytics workflow

<p>Marine dinitrogen (<span class="inline-formula">N<sub>2</sub></span>) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. A...

Full description

Saved in:
Bibliographic Details
Main Authors: M. Morando, J. D. Magasin, S. Cheung, M. M. Mills, J. P. Zehr, K. A. Turk-Kubo
Format: Article
Language:English
Published: Copernicus Publications 2025-02-01
Series:Earth System Science Data
Online Access:https://essd.copernicus.org/articles/17/393/2025/essd-17-393-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p>Marine dinitrogen (<span class="inline-formula">N<sub>2</sub></span>) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. Although marine <span class="inline-formula">N<sub>2</sub></span> fixation is often ascribed to cyanobacterial diazotrophs, indirect evidence suggests that non-cyanobacterial diazotrophs (NCDs) might also be important. One widely used approach for understanding diazotroph diversity and biogeography is polymerase chain reaction (PCR) amplification of a portion of the <i>nifH</i> gene, which encodes a structural component of the <span class="inline-formula">N<sub>2</sub></span>-fixing enzyme complex, nitrogenase. An array of bioinformatic tools exists to process <i>nifH</i> amplicon data; however, the lack of standardized practices has hindered cross-study comparisons. This has led to a missed opportunity to more thoroughly assess diazotroph diversity and biogeography, as well as their potential contributions to the marine N cycle. To address these knowledge gaps, a bioinformatic workflow was designed that standardizes the processing of <i>nifH</i> amplicon datasets originating from high-throughput sequencing (HTS). Multiple datasets are efficiently and consistently processed with a specialized DADA2 pipeline to identify amplicon sequence variants (ASVs). A series of customizable post-pipeline stages then detect and discard spurious <i>nifH</i> sequences and annotate the subsequent quality-filtered <i>nifH</i> ASVs using multiple reference databases and classification approaches. This newly developed workflow was used to reprocess nearly all publicly available <i>nifH</i> amplicon HTS datasets from marine studies and to generate a comprehensive <i>nifH</i> ASV database containing 9383 ASVs aggregated from 21 studies that represent the diazotrophic populations in the global ocean. For each sample, the database includes physical and chemical metadata obtained from the Simons Collaborative Marine Atlas Project (CMAP). Here we demonstrate the utility of this database for revealing global biogeographical patterns of prominent diazotroph groups and highlight the influence of sea surface temperature. The workflow and <i>nifH</i> ASV database provide a robust framework for studying marine <span class="inline-formula">N<sub>2</sub></span> fixation and diazotrophic diversity captured by <i>nifH</i> amplicon HTS. Future datasets that target understudied ocean regions can be added easily, and users can tune parameters and studies included for their specific focus. The workflow and database are available, respectively, on GitHub (<span class="uri">https://github.com/jdmagasin/nifH-ASV-workflow</span>, last access: 21 January 2025; Morando et al., 2024c) and Figshare (<a href="https://doi.org/10.6084/m9.figshare.23795943.v2">https://doi.org/10.6084/m9.figshare.23795943.v2</a>; Morando et al., 2024b).</p>
ISSN:1866-3508
1866-3516