Predicting cardiovascular risk with hybrid ensemble learning and explainable AI

Abstract Cardiovascular diseases (CVDs) are still one of the leading causes of death globally, underscoring the importance of early and right risk prediction for effective preventive measures and therapeutic approaches. This study proposes an innovative hybrid ensemble learning framework that combin...

Full description

Saved in:
Bibliographic Details
Main Authors: Pooja Shah, Madhu Shukla, Neel H. Dholakia, Himanshu Gupta
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-01650-7
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Cardiovascular diseases (CVDs) are still one of the leading causes of death globally, underscoring the importance of early and right risk prediction for effective preventive measures and therapeutic approaches. This study proposes an innovative hybrid ensemble learning framework that combines state-of-the-art machine learning models and explainable AI approaches to risk prediction for cardiovascular disease. Using a range of publicly accessible datasets, the suggested structure incorporates Gradient Boosting, CatBoost, and Neural Networks using a stacked ensemble architecture, resulting in more robust predictive performance than the constituent models. This is particularly interesting when visualised through techniques such as SHAP values, t-SNE and PCA projections which allows the study to explore the multidimensional aspects of the relationships between key risk factors including systolic/diastolic blood pressure, BMI, cholesterol-glucose ratio, alongside various lifestyle parameters. They build further on model interpretability through explainable AI methods so that clinicians can observe the involvement of each feature in generating the predictions. The hybrid model demonstrated strong predictive performance with an AUC-ROC score of 0.82, and confusion matrices showing a well-balanced classification of both positive and negative cases - achieving Precision: 81%, Recall: 83%, and F1-Score: 82% on the test dataset. The results highlight the potential of ensemble learning for addressing complex medical prediction problems and the need for models to be interpretable to ensure the trustworthiness of AI systems in healthcare settings. These findings provide an exciting opportunity toward better models of CVD risk prediction, potentially providing healthcare stakeholders with interpretable means to target treatments.
ISSN:2045-2322