Development and validation of an interpretable machine learning model for predicting hyperuricemia risk: Based on environmental chemical exposure

Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011–2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for h...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaochuan Lu, Huawei Kou, Cong Li, Runqing Zhan, Rongrong Guo, Shengnan Liu, Peixuan Shen, Meiyue Shen, Tingwei Du, Jiaqi Lu, Xiaoli Shen
Format: Article
Language:English
Published: Elsevier 2025-07-01
Series:Ecotoxicology and Environmental Safety
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0147651325007286
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011–2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011–2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768–0.845), balanced accuracy (0.762; 95 % CI: 0.721–0.802), F1 value (0585; 95 % CI: 0.535–0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122–0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.
ISSN:0147-6513