Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study

Background. Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over co...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yunzhen Ye, Yu Xiong, Qiongjie Zhou, Jiangnan Wu, Xiaotian Li, Xirong Xiao
Format:	Article
Language:	English
Published:	Wiley 2020-01-01
Series:	Journal of Diabetes Research
Online Access:	http://dx.doi.org/10.1155/2020/4168340
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832554574198603776
author	Yunzhen Ye Yu Xiong Qiongjie Zhou Jiangnan Wu Xiaotian Li Xirong Xiao
author_facet	Yunzhen Ye Yu Xiong Qiongjie Zhou Jiangnan Wu Xiaotian Li Xirong Xiao
author_sort	Yunzhen Ye
collection	DOAJ
description	Background. Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression. Objective. The purpose of this study was to use machine learning methods to predict GDM and compare their performance with that of logistic regressions. Methods. We performed a retrospective, observational study including women who attended their routine first hospital visits during early pregnancy and had Down’s syndrome screening at 16-20 gestational weeks in a tertiary maternity hospital in China from 2013.1.1 to 2017.12.31. A total of 22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM. Candidate predictors included maternal demographic characteristics and medical history (maternal factors) and laboratory values at early pregnancy. The models were derived from the first 70% of the data and then validated with the next 30%. Variables were trained in different machine learning models and traditional logistic regression models. Eight common machine learning methods (GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest) and two common regressions (stepwise logistic regression and logistic regression with RCS) were implemented to predict the occurrence of GDM. Models were compared on discrimination and calibration metrics. Results. In the validation dataset, the machine learning and logistic regression models performed moderately (AUC 0.59-0.74). Overall, the GBDT model performed best (AUC 0.74, 95% CI 0.71-0.76) among the machine learning methods, with negligible differences between them. Fasting blood glucose, HbA1c, triglycerides, and BMI strongly contributed to GDM. A cutoff point for the predictive value at 0.3 in the GBDT model had a negative predictive value of 74.1% (95% CI 69.5%-78.2%) and a sensitivity of 90% (95% CI 88.0%-91.7%), and the cutoff point at 0.7 had a positive predictive value of 93.2% (95% CI 88.2%-96.1%) and a specificity of 99% (95% CI 98.2%-99.4%). Conclusion. In this study, we found that several machine learning methods did not outperform logistic regression in predicting GDM. We developed a model with cutoff points for risk stratification of GDM.
format	Article
id	doaj-art-7a8acf2248a94c0fabb7bcff52bb9c71
institution	Kabale University
issn	2314-6745 2314-6753
language	English
publishDate	2020-01-01
publisher	Wiley
record_format	Article
series	Journal of Diabetes Research
spelling	doaj-art-7a8acf2248a94c0fabb7bcff52bb9c712025-02-03T05:51:11ZengWileyJournal of Diabetes Research2314-67452314-67532020-01-01202010.1155/2020/41683404168340Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort StudyYunzhen Ye0Yu Xiong1Qiongjie Zhou2Jiangnan Wu3Xiaotian Li4Xirong Xiao5Obstetrics and Gynecology Hospital, Fudan University, Shanghai, ChinaObstetrics and Gynecology Hospital, Fudan University, Shanghai, ChinaObstetrics and Gynecology Hospital, Fudan University, Shanghai, ChinaObstetrics and Gynecology Hospital, Fudan University, Shanghai, ChinaObstetrics and Gynecology Hospital, Fudan University, Shanghai, ChinaObstetrics and Gynecology Hospital, Fudan University, Shanghai, ChinaBackground. Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression. Objective. The purpose of this study was to use machine learning methods to predict GDM and compare their performance with that of logistic regressions. Methods. We performed a retrospective, observational study including women who attended their routine first hospital visits during early pregnancy and had Down’s syndrome screening at 16-20 gestational weeks in a tertiary maternity hospital in China from 2013.1.1 to 2017.12.31. A total of 22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM. Candidate predictors included maternal demographic characteristics and medical history (maternal factors) and laboratory values at early pregnancy. The models were derived from the first 70% of the data and then validated with the next 30%. Variables were trained in different machine learning models and traditional logistic regression models. Eight common machine learning methods (GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest) and two common regressions (stepwise logistic regression and logistic regression with RCS) were implemented to predict the occurrence of GDM. Models were compared on discrimination and calibration metrics. Results. In the validation dataset, the machine learning and logistic regression models performed moderately (AUC 0.59-0.74). Overall, the GBDT model performed best (AUC 0.74, 95% CI 0.71-0.76) among the machine learning methods, with negligible differences between them. Fasting blood glucose, HbA1c, triglycerides, and BMI strongly contributed to GDM. A cutoff point for the predictive value at 0.3 in the GBDT model had a negative predictive value of 74.1% (95% CI 69.5%-78.2%) and a sensitivity of 90% (95% CI 88.0%-91.7%), and the cutoff point at 0.7 had a positive predictive value of 93.2% (95% CI 88.2%-96.1%) and a specificity of 99% (95% CI 98.2%-99.4%). Conclusion. In this study, we found that several machine learning methods did not outperform logistic regression in predicting GDM. We developed a model with cutoff points for risk stratification of GDM.http://dx.doi.org/10.1155/2020/4168340
spellingShingle	Yunzhen Ye Yu Xiong Qiongjie Zhou Jiangnan Wu Xiaotian Li Xirong Xiao Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study Journal of Diabetes Research
title	Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_full	Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_fullStr	Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_full_unstemmed	Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_short	Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_sort	comparison of machine learning methods and conventional logistic regressions for predicting gestational diabetes using routine clinical data a retrospective cohort study
url	http://dx.doi.org/10.1155/2020/4168340
work_keys_str_mv	AT yunzhenye comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy AT yuxiong comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy AT qiongjiezhou comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy AT jiangnanwu comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy AT xiaotianli comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy AT xirongxiao comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy

Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study

Similar Items