Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution

Accumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive di...

Full description

Saved in:
Bibliographic Details
Main Authors: Kazeem A. Dauda, Olav N. L. Aga, Iain G. Johnston
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10835078/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586863982936064
author Kazeem A. Dauda
Olav N. L. Aga
Iain G. Johnston
author_facet Kazeem A. Dauda
Olav N. L. Aga
Iain G. Johnston
author_sort Kazeem A. Dauda
collection DOAJ
description Accumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive disease. Existing approaches for accumulation modelling are typically limited either in the number of features they consider or their ability to characterise interactions between these features – a limitation for the large-scale genetic and/or phenotypic datasets often found in modern biomedical applications. Here, we demonstrate how clustering can make such large-scale datasets tractable for powerful accumulation modelling approaches. Clustering resolves issues of sparsity and high dimensionality in datasets, but complicates the intepretation of the inferred dynamics, especially if observations are not independent. Focussing on hypercubic hidden Markov models (HyperHMM), we introduce several approaches for interpreting, estimating, and bounding the results of the dynamics in these cases and discuss how biomedical insight could be gained from such analyses. We demonstrate this ‘Cluster-based HyperHMM’ (CHyperHMM) pipeline for synthetic data, clinical data on disease progression in severe malaria, and genomic data for anti-microbial resistance evolution in Klebsiella pneumoniae, reflecting two global health threats.
format Article
id doaj-art-6ef2590847b549d580022359ea1e2170
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-6ef2590847b549d580022359ea1e21702025-01-25T00:01:29ZengIEEEIEEE Access2169-35362025-01-0113138161383110.1109/ACCESS.2025.352771510835078Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance EvolutionKazeem A. Dauda0https://orcid.org/0000-0002-4392-8592Olav N. L. Aga1Iain G. Johnston2https://orcid.org/0000-0001-8559-3519Department of Mathematics, University of Bergen, Bergen, NorwayDepartment of Clinical Science and Computational Biology Unit, University of Bergen, Bergen, NorwayDepartment of Mathematics and Computational Biology Unit, University of Bergen, Bergen, NorwayAccumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive disease. Existing approaches for accumulation modelling are typically limited either in the number of features they consider or their ability to characterise interactions between these features – a limitation for the large-scale genetic and/or phenotypic datasets often found in modern biomedical applications. Here, we demonstrate how clustering can make such large-scale datasets tractable for powerful accumulation modelling approaches. Clustering resolves issues of sparsity and high dimensionality in datasets, but complicates the intepretation of the inferred dynamics, especially if observations are not independent. Focussing on hypercubic hidden Markov models (HyperHMM), we introduce several approaches for interpreting, estimating, and bounding the results of the dynamics in these cases and discuss how biomedical insight could be gained from such analyses. We demonstrate this ‘Cluster-based HyperHMM’ (CHyperHMM) pipeline for synthetic data, clinical data on disease progression in severe malaria, and genomic data for anti-microbial resistance evolution in Klebsiella pneumoniae, reflecting two global health threats.https://ieeexplore.ieee.org/document/10835078/Accumulation modelinganti-microbial resistancebig dataclusteringgenomic dataMarkov model
spellingShingle Kazeem A. Dauda
Olav N. L. Aga
Iain G. Johnston
Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
IEEE Access
Accumulation modeling
anti-microbial resistance
big data
clustering
genomic data
Markov model
title Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_full Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_fullStr Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_full_unstemmed Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_short Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_sort clustering large scale biomedical data to model dynamic accumulation processes in disease progression and anti microbial resistance evolution
topic Accumulation modeling
anti-microbial resistance
big data
clustering
genomic data
Markov model
url https://ieeexplore.ieee.org/document/10835078/
work_keys_str_mv AT kazeemadauda clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution
AT olavnlaga clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution
AT iaingjohnston clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution