Development and validation of questionnaire-based machine-learning models to predict early natural menopause: a national cross-sectional study

Abstract This epidemiological survey recruited 18,015 postmenopausal women aged 36–60 in 13 cities across 12 provinces in China. Ten machine learning algorithms were evaluated, with the optimal model was selected by area under the curve (AUC). The Boruta algorithm identified 70 predictive factors, w...

Full description

Saved in:
Bibliographic Details
Main Authors: Chunmiao Zhou, Ziwei Xie, Qi Wang, Zhongxuan Wang, Bo Xie, Yehuan Yang, Li Yang, Ting Guo, Ruimin Zheng, Yingying Qin, Dongshan Zhu
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:npj Women's Health
Online Access:https://doi.org/10.1038/s44294-025-00098-4
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This epidemiological survey recruited 18,015 postmenopausal women aged 36–60 in 13 cities across 12 provinces in China. Ten machine learning algorithms were evaluated, with the optimal model was selected by area under the curve (AUC). The Boruta algorithm identified 70 predictive factors, with the XGBoost model performing best, achieving an AUC of 0.745 in the test set, a precision of 0.84, recall of 0.78, and an F1 score of 0.81. A simplified model with the top 20 factors was developed, achieving an AUC of 0.731. External validation using the China Health and Retirement Longitudinal Study (CHARLS) dataset achieved an AUC of 0.68, demonstrating moderate predictive performance. Shapley Additive Explanations (SHAP) showed that important predictors included age, income, region, height, number of siblings, and breastfeeding duration. The developed model provides an effective, non-invasive method for predicting early menopause based on questionnaire data, facilitating early identification of women at risk.
ISSN:2948-1716