Development, validation, and clinical application of a machine learning model for risk stratification and management of cervical cancer screening based on full-genotyping hrHPV test (SMART-HPV): a modelling studyResearch in context
Summary: Background: High-risk human papillomavirus (hrHPV) full genotyping facilitates risk stratification and efficiency in cervical cancer screening, widely verified and adopted in various screening settings. We aimed develop a cervical cancer predictive model that can guide referrals for colpos...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-02-01
|
Series: | The Lancet Regional Health. Western Pacific |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666606525000173 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Summary: Background: High-risk human papillomavirus (hrHPV) full genotyping facilitates risk stratification and efficiency in cervical cancer screening, widely verified and adopted in various screening settings. We aimed develop a cervical cancer predictive model that can guide referrals for colposcopy using hrHPV full genotyping data in a setting where screening rate is low. Methods: We developed, compared and validated four machine learning models (eXtreme gradient boosting [XGBoost], support vector machine [SVM], random forest [RF], and naïve bayes [NB]) for cervical cancer prediction, using data from a national cervical cancer screening project conducted in 267 healthcare centers in China. Cervical intraepithelial neoplasia grade 2 or worse (CIN2+) and CIN3+ were the primary and secondary outcomes. In various screening settings across China, the performance of discrimination was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, area under the precision–recall curve (AUPRC), and accuracy. Calibration and clinical utility were assessed with brier score, calibration curve and decision curve analysis (DCA). Findings: 1,112,846 women were recruited, of whom 599,043 were included in the analysis based on hrHPV full genotyping. Of these, 254,434 (age [years, median, IQR]: 48, 42–54), 297,479 (49, 43–55), 38,500 (37, 32–44), 1950 (38, 33–46), 1590 (53, 47–58), 779 (38, 31–49) and 4311 (40, 33–50) were in the development, temporal validation and external validation 1–5 datasets, respectively. The final simplified clinical risk prediction model includes hrHPV, number of HPV genotypes, cervical cytology, HPV16, HPV18, age, HPV52, HPV39 and gynecological examination. The final optimal XGBoost model for predicting CIN2+ showed good discrimination (AUROC, maximum 0.989 [0.987–0.992]; minimum 0.781 [0.74–0.819]), and calibration (brier score, maximum 0.118 [0.099–0.137]) in the five external validation sets. DCA showed that when the clinical decision threshold probability for optimal XGBoost model was less than 0.80, the model for predicting CIN2+ provided a superior standardized net benefit. The optimal XGBoost model obtained similar results in predicting CIN3+. Interpretation: We developed a cervical cancer screening risk prediction model that employs hrHPV full genotyping and simple test results to achieve risk prediction and stratified management for colposcopy referrals. This predictive tool is particularly suitable for settings with low screening rates. Funding: National Natural Science Foundation of China; Major Scientific Research Program for Young and Middle-aged Health Professionals of Fujian Province, China; Fujian Province Central Government-Guided Local Science and Technology Development Project; Fujian Province's Third Batch of Flexible Introduction of High-Level Medical Talent Teams; Fujian Provincial Natural Science Foundation of China; Fujian Provincial Science and Technology Innovation Joint Fund. |
---|---|
ISSN: | 2666-6065 |