Compositional data analysis enables statistical rigor in comparative glycomics

Abstract Comparative glycomics data are compositional data, where measured glycans are parts of a whole, indicated by relative abundances. Applying traditional statistical analyses to these data often results in misleading conclusions, such as spurious “decreases” of glycans when other structures in...

Full description

Saved in:
Bibliographic Details
Main Authors: Alexander R. Bennett, Jon Lundstrøm, Sayantani Chatterjee, Morten Thaysen-Andersen, Daniel Bojar
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-56249-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594615517052928
author Alexander R. Bennett
Jon Lundstrøm
Sayantani Chatterjee
Morten Thaysen-Andersen
Daniel Bojar
author_facet Alexander R. Bennett
Jon Lundstrøm
Sayantani Chatterjee
Morten Thaysen-Andersen
Daniel Bojar
author_sort Alexander R. Bennett
collection DOAJ
description Abstract Comparative glycomics data are compositional data, where measured glycans are parts of a whole, indicated by relative abundances. Applying traditional statistical analyses to these data often results in misleading conclusions, such as spurious “decreases” of glycans when other structures increase in abundance, or high false-positive rates for differential abundance. Our work introduces a compositional data analysis framework, tailored to comparative glycomics, to account for these data dependencies. We employ center log-ratio and additive log-ratio transformations, augmented with a scale uncertainty/information model, to introduce a statistically robust and sensitive data analysis pipeline. Applied to comparative glycomics datasets, including known glycan concentrations in defined mixtures, this approach controls false-positive rates and results in reproducible biological findings. Additionally, we present specialized analysis modalities: alpha- and beta-diversity analyze glycan distributions within and between samples, while cross-class glycan correlations shed light on previously undetected interdependencies. These approaches reveal insights into glycome variations that are critical to understanding roles of glycans in health and disease.
format Article
id doaj-art-456154471ba749b8a91fd10e01b07a69
institution Kabale University
issn 2041-1723
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-456154471ba749b8a91fd10e01b07a692025-01-19T12:30:14ZengNature PortfolioNature Communications2041-17232025-01-0116111510.1038/s41467-025-56249-3Compositional data analysis enables statistical rigor in comparative glycomicsAlexander R. Bennett0Jon Lundstrøm1Sayantani Chatterjee2Morten Thaysen-Andersen3Daniel Bojar4Department of Medical Biochemistry, Institute of Biomedicine, University of GothenburgDepartment of Chemistry and Molecular Biology, University of GothenburgSchool of Natural Sciences, Faculty of Science and Engineering, Macquarie UniversitySchool of Natural Sciences, Faculty of Science and Engineering, Macquarie UniversityDepartment of Chemistry and Molecular Biology, University of GothenburgAbstract Comparative glycomics data are compositional data, where measured glycans are parts of a whole, indicated by relative abundances. Applying traditional statistical analyses to these data often results in misleading conclusions, such as spurious “decreases” of glycans when other structures increase in abundance, or high false-positive rates for differential abundance. Our work introduces a compositional data analysis framework, tailored to comparative glycomics, to account for these data dependencies. We employ center log-ratio and additive log-ratio transformations, augmented with a scale uncertainty/information model, to introduce a statistically robust and sensitive data analysis pipeline. Applied to comparative glycomics datasets, including known glycan concentrations in defined mixtures, this approach controls false-positive rates and results in reproducible biological findings. Additionally, we present specialized analysis modalities: alpha- and beta-diversity analyze glycan distributions within and between samples, while cross-class glycan correlations shed light on previously undetected interdependencies. These approaches reveal insights into glycome variations that are critical to understanding roles of glycans in health and disease.https://doi.org/10.1038/s41467-025-56249-3
spellingShingle Alexander R. Bennett
Jon Lundstrøm
Sayantani Chatterjee
Morten Thaysen-Andersen
Daniel Bojar
Compositional data analysis enables statistical rigor in comparative glycomics
Nature Communications
title Compositional data analysis enables statistical rigor in comparative glycomics
title_full Compositional data analysis enables statistical rigor in comparative glycomics
title_fullStr Compositional data analysis enables statistical rigor in comparative glycomics
title_full_unstemmed Compositional data analysis enables statistical rigor in comparative glycomics
title_short Compositional data analysis enables statistical rigor in comparative glycomics
title_sort compositional data analysis enables statistical rigor in comparative glycomics
url https://doi.org/10.1038/s41467-025-56249-3
work_keys_str_mv AT alexanderrbennett compositionaldataanalysisenablesstatisticalrigorincomparativeglycomics
AT jonlundstrøm compositionaldataanalysisenablesstatisticalrigorincomparativeglycomics
AT sayantanichatterjee compositionaldataanalysisenablesstatisticalrigorincomparativeglycomics
AT mortenthaysenandersen compositionaldataanalysisenablesstatisticalrigorincomparativeglycomics
AT danielbojar compositionaldataanalysisenablesstatisticalrigorincomparativeglycomics