Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution

Accumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive di...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kazeem A. Dauda, Olav N. L. Aga, Iain G. Johnston
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Accumulation modeling anti-microbial resistance big data clustering genomic data Markov model
Online Access:	https://ieeexplore.ieee.org/document/10835078/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832586863982936064
author	Kazeem A. Dauda Olav N. L. Aga Iain G. Johnston
author_facet	Kazeem A. Dauda Olav N. L. Aga Iain G. Johnston
author_sort	Kazeem A. Dauda
collection	DOAJ
description	Accumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive disease. Existing approaches for accumulation modelling are typically limited either in the number of features they consider or their ability to characterise interactions between these features – a limitation for the large-scale genetic and/or phenotypic datasets often found in modern biomedical applications. Here, we demonstrate how clustering can make such large-scale datasets tractable for powerful accumulation modelling approaches. Clustering resolves issues of sparsity and high dimensionality in datasets, but complicates the intepretation of the inferred dynamics, especially if observations are not independent. Focussing on hypercubic hidden Markov models (HyperHMM), we introduce several approaches for interpreting, estimating, and bounding the results of the dynamics in these cases and discuss how biomedical insight could be gained from such analyses. We demonstrate this ‘Cluster-based HyperHMM’ (CHyperHMM) pipeline for synthetic data, clinical data on disease progression in severe malaria, and genomic data for anti-microbial resistance evolution in Klebsiella pneumoniae, reflecting two global health threats.
format	Article
id	doaj-art-6ef2590847b549d580022359ea1e2170
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-6ef2590847b549d580022359ea1e21702025-01-25T00:01:29ZengIEEEIEEE Access2169-35362025-01-0113138161383110.1109/ACCESS.2025.352771510835078Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance EvolutionKazeem A. Dauda0https://orcid.org/0000-0002-4392-8592Olav N. L. Aga1Iain G. Johnston2https://orcid.org/0000-0001-8559-3519Department of Mathematics, University of Bergen, Bergen, NorwayDepartment of Clinical Science and Computational Biology Unit, University of Bergen, Bergen, NorwayDepartment of Mathematics and Computational Biology Unit, University of Bergen, Bergen, NorwayAccumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive disease. Existing approaches for accumulation modelling are typically limited either in the number of features they consider or their ability to characterise interactions between these features – a limitation for the large-scale genetic and/or phenotypic datasets often found in modern biomedical applications. Here, we demonstrate how clustering can make such large-scale datasets tractable for powerful accumulation modelling approaches. Clustering resolves issues of sparsity and high dimensionality in datasets, but complicates the intepretation of the inferred dynamics, especially if observations are not independent. Focussing on hypercubic hidden Markov models (HyperHMM), we introduce several approaches for interpreting, estimating, and bounding the results of the dynamics in these cases and discuss how biomedical insight could be gained from such analyses. We demonstrate this ‘Cluster-based HyperHMM’ (CHyperHMM) pipeline for synthetic data, clinical data on disease progression in severe malaria, and genomic data for anti-microbial resistance evolution in Klebsiella pneumoniae, reflecting two global health threats.https://ieeexplore.ieee.org/document/10835078/Accumulation modelinganti-microbial resistancebig dataclusteringgenomic dataMarkov model
spellingShingle	Kazeem A. Dauda Olav N. L. Aga Iain G. Johnston Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution IEEE Access Accumulation modeling anti-microbial resistance big data clustering genomic data Markov model
title	Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_full	Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_fullStr	Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_full_unstemmed	Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_short	Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
title_sort	clustering large scale biomedical data to model dynamic accumulation processes in disease progression and anti microbial resistance evolution
topic	Accumulation modeling anti-microbial resistance big data clustering genomic data Markov model
url	https://ieeexplore.ieee.org/document/10835078/
work_keys_str_mv	AT kazeemadauda clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution AT olavnlaga clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution AT iaingjohnston clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution

Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution

Similar Items