Development and validation of questionnaire-based machine-learning models to predict early natural menopause: a national cross-sectional study
Abstract This epidemiological survey recruited 18,015 postmenopausal women aged 36–60 in 13 cities across 12 provinces in China. Ten machine learning algorithms were evaluated, with the optimal model was selected by area under the curve (AUC). The Boruta algorithm identified 70 predictive factors, w...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | npj Women's Health |
| Online Access: | https://doi.org/10.1038/s44294-025-00098-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract This epidemiological survey recruited 18,015 postmenopausal women aged 36–60 in 13 cities across 12 provinces in China. Ten machine learning algorithms were evaluated, with the optimal model was selected by area under the curve (AUC). The Boruta algorithm identified 70 predictive factors, with the XGBoost model performing best, achieving an AUC of 0.745 in the test set, a precision of 0.84, recall of 0.78, and an F1 score of 0.81. A simplified model with the top 20 factors was developed, achieving an AUC of 0.731. External validation using the China Health and Retirement Longitudinal Study (CHARLS) dataset achieved an AUC of 0.68, demonstrating moderate predictive performance. Shapley Additive Explanations (SHAP) showed that important predictors included age, income, region, height, number of siblings, and breastfeeding duration. The developed model provides an effective, non-invasive method for predicting early menopause based on questionnaire data, facilitating early identification of women at risk. |
|---|---|
| ISSN: | 2948-1716 |