Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study

BackgroundModern lifestyle risk factors, like physical inactivity and poor nutrition, contribute to rising rates of obesity and chronic diseases like type 2 diabetes and heart disease. Particularly personalized interventions have been shown to be effective for long-term behav...

Full description

Saved in:

Bibliographic Details
Main Authors:	Maik JM Beuken, Melanie Kleynen, Susy Braun, Kees Van Berkel, Carla van der Kallen, Annemarie Koster, Hans Bosma, Tos TJM Berendschot, Alfons JHM Houben, Nicole Dukers-Muijrers, Joop P van den Bergh, Abraham A Kroon, Iris M Kanera
Format:	Article
Language:	English
Published:	JMIR Publications 2025-02-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2025/1/e64479
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832096593532157952
author	Maik JM Beuken Melanie Kleynen Susy Braun Kees Van Berkel Carla van der Kallen Annemarie Koster Hans Bosma Tos TJM Berendschot Alfons JHM Houben Nicole Dukers-Muijrers Joop P van den Bergh Abraham A Kroon Iris M Kanera
author_facet	Maik JM Beuken Melanie Kleynen Susy Braun Kees Van Berkel Carla van der Kallen Annemarie Koster Hans Bosma Tos TJM Berendschot Alfons JHM Houben Nicole Dukers-Muijrers Joop P van den Bergh Abraham A Kroon Iris M Kanera
author_sort	Maik JM Beuken
collection	DOAJ
description	BackgroundModern lifestyle risk factors, like physical inactivity and poor nutrition, contribute to rising rates of obesity and chronic diseases like type 2 diabetes and heart disease. Particularly personalized interventions have been shown to be effective for long-term behavior change. Machine learning can be used to uncover insights without predefined hypotheses, revealing complex relationships and distinct population clusters. New data-driven approaches, such as the factor probabilistic distance clustering algorithm, provide opportunities to identify potentially meaningful clusters within large and complex datasets. ObjectiveThis study aimed to identify potential clusters and relevant variables among individuals with obesity using a data-driven and hypothesis-free machine learning approach. MethodsWe used cross-sectional data from individuals with abdominal obesity from The Maastricht Study. Data (2971 variables) included demographics, lifestyle, biomedical aspects, advanced phenotyping, and social factors (cohort 2010). The factor probabilistic distance clustering algorithm was applied in order to detect clusters within this high-dimensional data. To identify a subset of distinct, minimally redundant, predictive variables, we used the statistically equivalent signature algorithm. To describe the clusters, we applied measures of central tendency and variability, and we assessed the distinctiveness of the clusters through the emerged variables using the F test for continuous variables and the chi-square test for categorical variables at a confidence level of α=.001 ResultsWe identified 3 distinct clusters (including 4128/9188, 44.93% of all data points) among individuals with obesity (n=4128). The most significant continuous variable for distinguishing cluster 1 (n=1458) from clusters 2 and 3 combined (n=2670) was the lower energy intake (mean 1684, SD 393 kcal/day vs mean 2358, SD 635 kcal/day; P<.001). The most significant categorical variable was occupation (P<.001). A significantly higher proportion (1236/1458, 84.77%) in cluster 1 did not work compared to clusters 2 and 3 combined (1486/2670, 55.66%; P<.001). For cluster 2 (n=1521), the most significant continuous variable was a higher energy intake (mean 2755, SD 506.2 kcal/day vs mean 1749, SD 375 kcal/day; P<.001). The most significant categorical variable was sex (P<.001). A significantly higher proportion (997/1521, 65.55%) in cluster 2 were male compared to the other 2 clusters (885/2607, 33.95%; P<.001). For cluster 3 (n=1149), the most significant continuous variable was overall higher cognitive functioning (mean 0.2349, SD 0.5702 vs mean –0.3088, SD 0.7212; P<.001), and educational level was the most significant categorical variable (P<.001). A significantly higher proportion (475/1149, 41.34%) in cluster 3 received higher vocational or university education in comparison to clusters 1 and 2 combined (729/2979, 24.47%; P<.001). ConclusionsThis study demonstrates that a hypothesis-free and fully data-driven approach can be used to identify distinguishable participant clusters in large and complex datasets and find relevant variables that differ within populations with obesity.
format	Article
id	doaj-art-78afe985e59047308e4bcd8b7411601e
institution	Kabale University
issn	2291-9694
language	English
publishDate	2025-02-01
publisher	JMIR Publications
record_format	Article
series	JMIR Medical Informatics
spelling	doaj-art-78afe985e59047308e4bcd8b7411601e2025-02-05T13:31:50ZengJMIR PublicationsJMIR Medical Informatics2291-96942025-02-0113e6447910.2196/64479Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht StudyMaik JM Beukenhttps://orcid.org/0000-0002-8356-2716Melanie Kleynenhttps://orcid.org/0000-0002-6543-6994Susy Braunhttps://orcid.org/0000-0002-3037-3428Kees Van Berkelhttps://orcid.org/0000-0003-0139-9298Carla van der Kallenhttps://orcid.org/0000-0003-1468-8793Annemarie Kosterhttps://orcid.org/0000-0003-1583-7391Hans Bosmahttps://orcid.org/0000-0003-4333-4564Tos TJM Berendschothttps://orcid.org/0000-0002-8101-939XAlfons JHM Houbenhttps://orcid.org/0000-0002-1747-8452Nicole Dukers-Muijrershttps://orcid.org/0000-0003-4896-758XJoop P van den Berghhttps://orcid.org/0000-0003-3984-2232Abraham A Kroonhttps://orcid.org/0000-0001-7750-8249Iris M Kanerahttps://orcid.org/0000-0001-6863-2096 BackgroundModern lifestyle risk factors, like physical inactivity and poor nutrition, contribute to rising rates of obesity and chronic diseases like type 2 diabetes and heart disease. Particularly personalized interventions have been shown to be effective for long-term behavior change. Machine learning can be used to uncover insights without predefined hypotheses, revealing complex relationships and distinct population clusters. New data-driven approaches, such as the factor probabilistic distance clustering algorithm, provide opportunities to identify potentially meaningful clusters within large and complex datasets. ObjectiveThis study aimed to identify potential clusters and relevant variables among individuals with obesity using a data-driven and hypothesis-free machine learning approach. MethodsWe used cross-sectional data from individuals with abdominal obesity from The Maastricht Study. Data (2971 variables) included demographics, lifestyle, biomedical aspects, advanced phenotyping, and social factors (cohort 2010). The factor probabilistic distance clustering algorithm was applied in order to detect clusters within this high-dimensional data. To identify a subset of distinct, minimally redundant, predictive variables, we used the statistically equivalent signature algorithm. To describe the clusters, we applied measures of central tendency and variability, and we assessed the distinctiveness of the clusters through the emerged variables using the F test for continuous variables and the chi-square test for categorical variables at a confidence level of α=.001 ResultsWe identified 3 distinct clusters (including 4128/9188, 44.93% of all data points) among individuals with obesity (n=4128). The most significant continuous variable for distinguishing cluster 1 (n=1458) from clusters 2 and 3 combined (n=2670) was the lower energy intake (mean 1684, SD 393 kcal/day vs mean 2358, SD 635 kcal/day; P<.001). The most significant categorical variable was occupation (P<.001). A significantly higher proportion (1236/1458, 84.77%) in cluster 1 did not work compared to clusters 2 and 3 combined (1486/2670, 55.66%; P<.001). For cluster 2 (n=1521), the most significant continuous variable was a higher energy intake (mean 2755, SD 506.2 kcal/day vs mean 1749, SD 375 kcal/day; P<.001). The most significant categorical variable was sex (P<.001). A significantly higher proportion (997/1521, 65.55%) in cluster 2 were male compared to the other 2 clusters (885/2607, 33.95%; P<.001). For cluster 3 (n=1149), the most significant continuous variable was overall higher cognitive functioning (mean 0.2349, SD 0.5702 vs mean –0.3088, SD 0.7212; P<.001), and educational level was the most significant categorical variable (P<.001). A significantly higher proportion (475/1149, 41.34%) in cluster 3 received higher vocational or university education in comparison to clusters 1 and 2 combined (729/2979, 24.47%; P<.001). ConclusionsThis study demonstrates that a hypothesis-free and fully data-driven approach can be used to identify distinguishable participant clusters in large and complex datasets and find relevant variables that differ within populations with obesity.https://medinform.jmir.org/2025/1/e64479
spellingShingle	Maik JM Beuken Melanie Kleynen Susy Braun Kees Van Berkel Carla van der Kallen Annemarie Koster Hans Bosma Tos TJM Berendschot Alfons JHM Houben Nicole Dukers-Muijrers Joop P van den Bergh Abraham A Kroon Iris M Kanera Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study JMIR Medical Informatics
title	Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study
title_full	Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study
title_fullStr	Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study
title_full_unstemmed	Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study
title_short	Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study
title_sort	identification of clusters in a population with obesity using machine learning secondary analysis of the maastricht study
url	https://medinform.jmir.org/2025/1/e64479
work_keys_str_mv	AT maikjmbeuken identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT melaniekleynen identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT susybraun identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT keesvanberkel identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT carlavanderkallen identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT annemariekoster identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT hansbosma identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT tostjmberendschot identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT alfonsjhmhouben identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT nicoledukersmuijrers identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT jooppvandenbergh identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT abrahamakroon identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy AT irismkanera identificationofclustersinapopulationwithobesityusingmachinelearningsecondaryanalysisofthemaastrichtstudy

Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study

Similar Items