A machine learning framework for short-term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle data

Abstract Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous disease with a variety of symptoms including, persistent coughing and mucus production, shortness of breath, wheezing, and chest tightness. As the disease advances, exacerbations, i.e. acute worsening of respiratory symptoms, m...

Full description

Saved in:
Bibliographic Details
Main Authors: M. Atzeni, G. Cappon, J. K. Quint, F. Kelly, B. Barratt, M. Vettoretti
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-85089-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594689253965824
author M. Atzeni
G. Cappon
J. K. Quint
F. Kelly
B. Barratt
M. Vettoretti
author_facet M. Atzeni
G. Cappon
J. K. Quint
F. Kelly
B. Barratt
M. Vettoretti
author_sort M. Atzeni
collection DOAJ
description Abstract Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous disease with a variety of symptoms including, persistent coughing and mucus production, shortness of breath, wheezing, and chest tightness. As the disease advances, exacerbations, i.e. acute worsening of respiratory symptoms, may increase in frequency, leading to potentially life-threatening complications. Exposure to air pollutants may trigger COPD exacerbations. Literature predictive models for COPD exacerbations, while promising, may be constrained by their reliance on fixed air quality sensor data that may not fully capture individuals’ dynamic exposure to air pollution. To address this, we designed a machine learning (ML) framework that leverages data from personal air quality monitors, health records, lifestyle, and living condition information to build models that perform short-term prediction of COPD exacerbations. The framework employs (i) k-means clustering to uncover potentially distinct patient sub-types, (ii) supervised ML techniques (Logistic Regression, Random Forest, and eXtreme Gradient Boosting) to train and test predictive models for each patient sub-type and (iii) an explainable artificial intelligence technique (SHAP) to interpret the final models. The framework was tested on data collected in 101 COPD patients monitored for up to 6 months with occurrence of exacerbation in 10.7% of total samples. Two different patient sub-types have been identified, characterised by different disease severity. The best performing models were Random Forest in cluster 1, with area under the receiver operating characteristic curve (AUC) of 0.90, and area under the precision/recall curve (AUPRC) of 0.7; and Random Forest model in cluster 2, with AUC of 0.82 and AUPRC of 0.56. The model interpretability analysis identified previous symptoms and cumulative pollutant exposure as key predictors of exacerbations. The results of our study set a premise for a predictive framework in COPD exacerbations, particularly investigating the potential influence of environmental features. The SHAP analysis revealed that the contribution of environmental features is not uniform across all subjects. For instance, cumulative exposure to pollutants demonstrated greater predictive power in cluster 1. The SHAP analysis also shown that overall clinical factors and individual symptomatology play the most significant role in this setup to determine exacerbation risk.
format Article
id doaj-art-515f39e6cb864b9fabedb9116007dd50
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-515f39e6cb864b9fabedb9116007dd502025-01-19T12:24:15ZengNature PortfolioScientific Reports2045-23222025-01-0115111510.1038/s41598-024-85089-2A machine learning framework for short-term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle dataM. Atzeni0G. Cappon1J. K. Quint2F. Kelly3B. Barratt4M. Vettoretti5Department of Information Engineering, University of PadovaDepartment of Information Engineering, University of PadovaSchool of Public Health, Imperial College LondonEnvironmental Research Group, MRC Centre for Environment and Health, Imperial College LondonEnvironmental Research Group, MRC Centre for Environment and Health, Imperial College LondonDepartment of Information Engineering, University of PadovaAbstract Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous disease with a variety of symptoms including, persistent coughing and mucus production, shortness of breath, wheezing, and chest tightness. As the disease advances, exacerbations, i.e. acute worsening of respiratory symptoms, may increase in frequency, leading to potentially life-threatening complications. Exposure to air pollutants may trigger COPD exacerbations. Literature predictive models for COPD exacerbations, while promising, may be constrained by their reliance on fixed air quality sensor data that may not fully capture individuals’ dynamic exposure to air pollution. To address this, we designed a machine learning (ML) framework that leverages data from personal air quality monitors, health records, lifestyle, and living condition information to build models that perform short-term prediction of COPD exacerbations. The framework employs (i) k-means clustering to uncover potentially distinct patient sub-types, (ii) supervised ML techniques (Logistic Regression, Random Forest, and eXtreme Gradient Boosting) to train and test predictive models for each patient sub-type and (iii) an explainable artificial intelligence technique (SHAP) to interpret the final models. The framework was tested on data collected in 101 COPD patients monitored for up to 6 months with occurrence of exacerbation in 10.7% of total samples. Two different patient sub-types have been identified, characterised by different disease severity. The best performing models were Random Forest in cluster 1, with area under the receiver operating characteristic curve (AUC) of 0.90, and area under the precision/recall curve (AUPRC) of 0.7; and Random Forest model in cluster 2, with AUC of 0.82 and AUPRC of 0.56. The model interpretability analysis identified previous symptoms and cumulative pollutant exposure as key predictors of exacerbations. The results of our study set a premise for a predictive framework in COPD exacerbations, particularly investigating the potential influence of environmental features. The SHAP analysis revealed that the contribution of environmental features is not uniform across all subjects. For instance, cumulative exposure to pollutants demonstrated greater predictive power in cluster 1. The SHAP analysis also shown that overall clinical factors and individual symptomatology play the most significant role in this setup to determine exacerbation risk.https://doi.org/10.1038/s41598-024-85089-2
spellingShingle M. Atzeni
G. Cappon
J. K. Quint
F. Kelly
B. Barratt
M. Vettoretti
A machine learning framework for short-term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle data
Scientific Reports
title A machine learning framework for short-term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle data
title_full A machine learning framework for short-term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle data
title_fullStr A machine learning framework for short-term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle data
title_full_unstemmed A machine learning framework for short-term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle data
title_short A machine learning framework for short-term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle data
title_sort machine learning framework for short term prediction of chronic obstructive pulmonary disease exacerbations using personal air quality monitors and lifestyle data
url https://doi.org/10.1038/s41598-024-85089-2
work_keys_str_mv AT matzeni amachinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT gcappon amachinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT jkquint amachinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT fkelly amachinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT bbarratt amachinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT mvettoretti amachinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT matzeni machinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT gcappon machinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT jkquint machinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT fkelly machinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT bbarratt machinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata
AT mvettoretti machinelearningframeworkforshorttermpredictionofchronicobstructivepulmonarydiseaseexacerbationsusingpersonalairqualitymonitorsandlifestyledata