Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation studyResearch in context

Summary: Background: Type 2 diabetes mellitus (T2DM) is a significant global public health concern that has steadily increased over the past few decades. Thus, this study aimed to predict the incidence of T2DM within 5 years and the risk of mortality following the onset of T2DM. Data from three ind...

Full description

Saved in:
Bibliographic Details
Main Authors: Hayeon Lee, Seung Ha Hwang, Seoyoung Park, Yunjeong Choi, Sooji Lee, Jaeyu Park, Yejun Son, Hyeon Jin Kim, Soeun Kim, Jiyeon Oh, Lee Smith, Damiano Pizzol, Sang Youl Rhee, Hyunji Sang, Jinseok Lee, Dong Keon Yon
Format: Article
Language:English
Published: Elsevier 2025-02-01
Series:EClinicalMedicine
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S258953702500001X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595339494817792
author Hayeon Lee
Seung Ha Hwang
Seoyoung Park
Yunjeong Choi
Sooji Lee
Jaeyu Park
Yejun Son
Hyeon Jin Kim
Soeun Kim
Jiyeon Oh
Lee Smith
Damiano Pizzol
Sang Youl Rhee
Hyunji Sang
Jinseok Lee
Dong Keon Yon
author_facet Hayeon Lee
Seung Ha Hwang
Seoyoung Park
Yunjeong Choi
Sooji Lee
Jaeyu Park
Yejun Son
Hyeon Jin Kim
Soeun Kim
Jiyeon Oh
Lee Smith
Damiano Pizzol
Sang Youl Rhee
Hyunji Sang
Jinseok Lee
Dong Keon Yon
author_sort Hayeon Lee
collection DOAJ
description Summary: Background: Type 2 diabetes mellitus (T2DM) is a significant global public health concern that has steadily increased over the past few decades. Thus, this study aimed to predict the incidence of T2DM within 5 years and the risk of mortality following the onset of T2DM. Data from three independent cohorts worldwide were used. Methods: We utilized data from three independent, large-scale, general population-based, and worldwide cohort studies. The Korean cohort (NHIS-NSC cohort; discovery cohort; n = 973,303), conducted between 1 January, 2002 and 31 December, 2013, was used for training and internal validation, whereas the Japanese cohort (JMDC cohort; validation cohort A; n = 12,143,715) and UK cohort (UK Biobank; validation cohort B; n = 416,656) were used for external validation. We employed various machine learning (ML)-based models, using 18 features, to predict the incidence of T2DM within five years of regular health checkups and calculated the Shapley Additive Explanation (SHAP) values. To ensure the robustness of our ML-based prediction model, we investigated the potential association between the model probability divided into tertiles and the risk of mortality following the onset of T2DM. Findings: In the discovery cohort, the ensemble model using voting with logistic regression and adaptive boosting achieved a balanced accuracy of 72.6% and an area under the receiver operating characteristics curve (AUROC) of 0.792. The SHAP value analysis of our proposed model revealed that age was the most important predictor of incident T2DM, followed by fasting blood glucose, hemoglobin, γ-glutamyl transferase level, and body mass index. The model probability is associated with an increased risk of mortality (T1: adjusted hazard ratio, 2.82 [95% CI, 2.01–3.94]; T2: 3.89 [2.74–5.53]; and T3: 7.73 [5.37–11.12]). Similar patterns and trends were observed in the validation cohorts (T1: 1.74 [1.49–2.03], T2: 1.97 [1.69–2.30], and T3: 3.31 [2.82–3.38] in validation cohort A; T1: 1.33 [1.03–1.71], T2: 1.54 [1.21–1.96], and T3: 1.73 [1.36–2.20] in validation cohort B). Interpretation: This study derived and validated an ML-based model to predict the incidence of T2DM within 5 years across three countries (South Korea, Japan, and the UK), showing that the model probability is associated with an increased risk of mortality. Funding: Institute of Information & Communications Technology Planning & Evaluation, South Korea.
format Article
id doaj-art-938ac8969ae147b68f4708221d4a51d5
institution Kabale University
issn 2589-5370
language English
publishDate 2025-02-01
publisher Elsevier
record_format Article
series EClinicalMedicine
spelling doaj-art-938ac8969ae147b68f4708221d4a51d52025-01-19T06:26:31ZengElsevierEClinicalMedicine2589-53702025-02-0180103069Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation studyResearch in contextHayeon Lee0Seung Ha Hwang1Seoyoung Park2Yunjeong Choi3Sooji Lee4Jaeyu Park5Yejun Son6Hyeon Jin Kim7Soeun Kim8Jiyeon Oh9Lee Smith10Damiano Pizzol11Sang Youl Rhee12Hyunji Sang13Jinseok Lee14Dong Keon Yon15Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Biomedical Engineering, Kyung Hee University, Yongin, South Korea; Department of Electronics and Information Convergence Engineering, Kyung Hee University, Yongin, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Biomedical Engineering, Kyung Hee University, Yongin, South Korea; Department of Electronics and Information Convergence Engineering, Kyung Hee University, Yongin, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Precision Medicine, Kyung Hee University College of Medicine, Seoul, South KoreaDepartment of Biomedical Engineering, Kyung Hee University, Yongin, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Regulatory Science, Kyung Hee University, Seoul, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Precision Medicine, Kyung Hee University College of Medicine, Seoul, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Regulatory Science, Kyung Hee University, Seoul, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Precision Medicine, Kyung Hee University College of Medicine, Seoul, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South KoreaCentre for Health Performance and Wellbeing, Anglia Ruskin University, Cambridge, UKHealth Unit Eni, Maputo, Mozambique; Health Unit, Eni, San Donato Milanese, ItalyCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Precision Medicine, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Regulatory Science, Kyung Hee University, Seoul, South Korea; Department of Endocrinology and Metabolism, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, South KoreaCenter for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Endocrinology and Metabolism, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, South Korea; Corresponding author. Department of Endocrinology and Metabolism, Kyung Hee University College of Medicine, 23 Kyungheedae-ro, Dongdaemun-gu, Seoul, 02447, South Korea.Department of Biomedical Engineering, Kyung Hee University, Yongin, South Korea; Department of Electronics and Information Convergence Engineering, Kyung Hee University, Yongin, South Korea; Corresponding author. Department of Biomedical Engineering, Kyung Hee University, 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, Yongin, 17104, South Korea.Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Biomedical Engineering, Kyung Hee University, Yongin, South Korea; Department of Precision Medicine, Kyung Hee University College of Medicine, Seoul, South Korea; Department of Regulatory Science, Kyung Hee University, Seoul, South Korea; Department of Pediatrics, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea; Corresponding author. Department of Pediatrics, Kyung Hee University College of Medicine, 23 Kyungheedae-ro, Dongdaemun-gu, Seoul, 02447, South Korea.Summary: Background: Type 2 diabetes mellitus (T2DM) is a significant global public health concern that has steadily increased over the past few decades. Thus, this study aimed to predict the incidence of T2DM within 5 years and the risk of mortality following the onset of T2DM. Data from three independent cohorts worldwide were used. Methods: We utilized data from three independent, large-scale, general population-based, and worldwide cohort studies. The Korean cohort (NHIS-NSC cohort; discovery cohort; n = 973,303), conducted between 1 January, 2002 and 31 December, 2013, was used for training and internal validation, whereas the Japanese cohort (JMDC cohort; validation cohort A; n = 12,143,715) and UK cohort (UK Biobank; validation cohort B; n = 416,656) were used for external validation. We employed various machine learning (ML)-based models, using 18 features, to predict the incidence of T2DM within five years of regular health checkups and calculated the Shapley Additive Explanation (SHAP) values. To ensure the robustness of our ML-based prediction model, we investigated the potential association between the model probability divided into tertiles and the risk of mortality following the onset of T2DM. Findings: In the discovery cohort, the ensemble model using voting with logistic regression and adaptive boosting achieved a balanced accuracy of 72.6% and an area under the receiver operating characteristics curve (AUROC) of 0.792. The SHAP value analysis of our proposed model revealed that age was the most important predictor of incident T2DM, followed by fasting blood glucose, hemoglobin, γ-glutamyl transferase level, and body mass index. The model probability is associated with an increased risk of mortality (T1: adjusted hazard ratio, 2.82 [95% CI, 2.01–3.94]; T2: 3.89 [2.74–5.53]; and T3: 7.73 [5.37–11.12]). Similar patterns and trends were observed in the validation cohorts (T1: 1.74 [1.49–2.03], T2: 1.97 [1.69–2.30], and T3: 3.31 [2.82–3.38] in validation cohort A; T1: 1.33 [1.03–1.71], T2: 1.54 [1.21–1.96], and T3: 1.73 [1.36–2.20] in validation cohort B). Interpretation: This study derived and validated an ML-based model to predict the incidence of T2DM within 5 years across three countries (South Korea, Japan, and the UK), showing that the model probability is associated with an increased risk of mortality. Funding: Institute of Information & Communications Technology Planning & Evaluation, South Korea.http://www.sciencedirect.com/science/article/pii/S258953702500001XDiabetes mellitusJapanMachine learningMortalitySouth KoreaUnited Kingdom
spellingShingle Hayeon Lee
Seung Ha Hwang
Seoyoung Park
Yunjeong Choi
Sooji Lee
Jaeyu Park
Yejun Son
Hyeon Jin Kim
Soeun Kim
Jiyeon Oh
Lee Smith
Damiano Pizzol
Sang Youl Rhee
Hyunji Sang
Jinseok Lee
Dong Keon Yon
Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation studyResearch in context
EClinicalMedicine
Diabetes mellitus
Japan
Machine learning
Mortality
South Korea
United Kingdom
title Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation studyResearch in context
title_full Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation studyResearch in context
title_fullStr Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation studyResearch in context
title_full_unstemmed Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation studyResearch in context
title_short Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation studyResearch in context
title_sort prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from south korea japan and the uk a model development and validation studyresearch in context
topic Diabetes mellitus
Japan
Machine learning
Mortality
South Korea
United Kingdom
url http://www.sciencedirect.com/science/article/pii/S258953702500001X
work_keys_str_mv AT hayeonlee predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT seunghahwang predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT seoyoungpark predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT yunjeongchoi predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT soojilee predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT jaeyupark predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT yejunson predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT hyeonjinkim predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT soeunkim predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT jiyeonoh predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT leesmith predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT damianopizzol predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT sangyoulrhee predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT hyunjisang predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT jinseoklee predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext
AT dongkeonyon predictionmodelfortype2diabetesmellitusanditsassociationwithmortalityusingmachinelearninginthreeindependentcohortsfromsouthkoreajapanandtheukamodeldevelopmentandvalidationstudyresearchincontext