Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution
Accumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive di...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10835078/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586863982936064 |
---|---|
author | Kazeem A. Dauda Olav N. L. Aga Iain G. Johnston |
author_facet | Kazeem A. Dauda Olav N. L. Aga Iain G. Johnston |
author_sort | Kazeem A. Dauda |
collection | DOAJ |
description | Accumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive disease. Existing approaches for accumulation modelling are typically limited either in the number of features they consider or their ability to characterise interactions between these features – a limitation for the large-scale genetic and/or phenotypic datasets often found in modern biomedical applications. Here, we demonstrate how clustering can make such large-scale datasets tractable for powerful accumulation modelling approaches. Clustering resolves issues of sparsity and high dimensionality in datasets, but complicates the intepretation of the inferred dynamics, especially if observations are not independent. Focussing on hypercubic hidden Markov models (HyperHMM), we introduce several approaches for interpreting, estimating, and bounding the results of the dynamics in these cases and discuss how biomedical insight could be gained from such analyses. We demonstrate this ‘Cluster-based HyperHMM’ (CHyperHMM) pipeline for synthetic data, clinical data on disease progression in severe malaria, and genomic data for anti-microbial resistance evolution in Klebsiella pneumoniae, reflecting two global health threats. |
format | Article |
id | doaj-art-6ef2590847b549d580022359ea1e2170 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-6ef2590847b549d580022359ea1e21702025-01-25T00:01:29ZengIEEEIEEE Access2169-35362025-01-0113138161383110.1109/ACCESS.2025.352771510835078Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance EvolutionKazeem A. Dauda0https://orcid.org/0000-0002-4392-8592Olav N. L. Aga1Iain G. Johnston2https://orcid.org/0000-0001-8559-3519Department of Mathematics, University of Bergen, Bergen, NorwayDepartment of Clinical Science and Computational Biology Unit, University of Bergen, Bergen, NorwayDepartment of Mathematics and Computational Biology Unit, University of Bergen, Bergen, NorwayAccumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive disease. Existing approaches for accumulation modelling are typically limited either in the number of features they consider or their ability to characterise interactions between these features – a limitation for the large-scale genetic and/or phenotypic datasets often found in modern biomedical applications. Here, we demonstrate how clustering can make such large-scale datasets tractable for powerful accumulation modelling approaches. Clustering resolves issues of sparsity and high dimensionality in datasets, but complicates the intepretation of the inferred dynamics, especially if observations are not independent. Focussing on hypercubic hidden Markov models (HyperHMM), we introduce several approaches for interpreting, estimating, and bounding the results of the dynamics in these cases and discuss how biomedical insight could be gained from such analyses. We demonstrate this ‘Cluster-based HyperHMM’ (CHyperHMM) pipeline for synthetic data, clinical data on disease progression in severe malaria, and genomic data for anti-microbial resistance evolution in Klebsiella pneumoniae, reflecting two global health threats.https://ieeexplore.ieee.org/document/10835078/Accumulation modelinganti-microbial resistancebig dataclusteringgenomic dataMarkov model |
spellingShingle | Kazeem A. Dauda Olav N. L. Aga Iain G. Johnston Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution IEEE Access Accumulation modeling anti-microbial resistance big data clustering genomic data Markov model |
title | Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution |
title_full | Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution |
title_fullStr | Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution |
title_full_unstemmed | Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution |
title_short | Clustering Large-Scale Biomedical Data to Model Dynamic Accumulation Processes in Disease Progression and Anti-Microbial Resistance Evolution |
title_sort | clustering large scale biomedical data to model dynamic accumulation processes in disease progression and anti microbial resistance evolution |
topic | Accumulation modeling anti-microbial resistance big data clustering genomic data Markov model |
url | https://ieeexplore.ieee.org/document/10835078/ |
work_keys_str_mv | AT kazeemadauda clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution AT olavnlaga clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution AT iaingjohnston clusteringlargescalebiomedicaldatatomodeldynamicaccumulationprocessesindiseaseprogressionandantimicrobialresistanceevolution |