A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution Events

Surface PM<sub>2.5</sub> concentrations have significant implications for human health, necessitating accurate estimations. This study compares various machine learning models, including linear models, tree-based algorithms, and artificial neural networks (ANNs) for estimating PM<sub&...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shijin Wei, Kyle Shores, Yangyang Xu
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Atmosphere
Subjects:	machine learning air quality artificial neural network MERRA-2 reanalysis high pollution events
Online Access:	https://www.mdpi.com/2073-4433/16/1/48
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832589147370422272
author	Shijin Wei Kyle Shores Yangyang Xu
author_facet	Shijin Wei Kyle Shores Yangyang Xu
author_sort	Shijin Wei
collection	DOAJ
description	Surface PM<sub>2.5</sub> concentrations have significant implications for human health, necessitating accurate estimations. This study compares various machine learning models, including linear models, tree-based algorithms, and artificial neural networks (ANNs) for estimating PM<sub>2.5</sub> concentrations using the MERRA-2 dataset from 2012 to 2023. Mutual information and Spearman cross-feature correlation scores are used during feature selections. The performance of models is evaluated using metrics including normalized Nash–Sutcliffe efficiency (NNSE), root mean standard deviation ratio (RSR), and mean percentage error (MPE). Our results show that ANNs outperform linear and tree models, particularly in estimating daily PM<sub>2.5</sub> concentrations of 35–1000 µg/m<sup>3</sup>. ANNs improve NNSE by 119% and 46%, RSR by 40% and 24%, and MPE by 44% and 30% from linear and tree models, respectively, indicating ANN’s superior estimation performance during high pollution days. The sensitivity analysis of features that interpret the models suggests that the total extinction AOD at 550 nm and surface CO concentrations are the most important features in the Western and Eastern U.S., respectively. The findings suggest that even the simplest NNs provide better air quality estimates, especially during high pollution events, which is beneficial for long-term exposure analysis. Future research should explore more sophisticated NN architectures with spatial and temporal variations in PM<sub>2.5</sub> to improve the model performance.
format	Article
id	doaj-art-d8f2c54cd3a64ef19f7af1d0679f6fef
institution	Kabale University
issn	2073-4433
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Atmosphere
spelling	doaj-art-d8f2c54cd3a64ef19f7af1d0679f6fef2025-01-24T13:21:50ZengMDPI AGAtmosphere2073-44332025-01-011614810.3390/atmos16010048A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution EventsShijin Wei0Kyle Shores1Yangyang Xu2Department of Atmospheric Sciences, College of Arts and Sciences, Texas A&M University, College Station, TX 77840, USAThe National Center for Atmospheric Research, Boulder, CO 80305, USADepartment of Atmospheric Sciences, College of Arts and Sciences, Texas A&M University, College Station, TX 77840, USASurface PM<sub>2.5</sub> concentrations have significant implications for human health, necessitating accurate estimations. This study compares various machine learning models, including linear models, tree-based algorithms, and artificial neural networks (ANNs) for estimating PM<sub>2.5</sub> concentrations using the MERRA-2 dataset from 2012 to 2023. Mutual information and Spearman cross-feature correlation scores are used during feature selections. The performance of models is evaluated using metrics including normalized Nash–Sutcliffe efficiency (NNSE), root mean standard deviation ratio (RSR), and mean percentage error (MPE). Our results show that ANNs outperform linear and tree models, particularly in estimating daily PM<sub>2.5</sub> concentrations of 35–1000 µg/m<sup>3</sup>. ANNs improve NNSE by 119% and 46%, RSR by 40% and 24%, and MPE by 44% and 30% from linear and tree models, respectively, indicating ANN’s superior estimation performance during high pollution days. The sensitivity analysis of features that interpret the models suggests that the total extinction AOD at 550 nm and surface CO concentrations are the most important features in the Western and Eastern U.S., respectively. The findings suggest that even the simplest NNs provide better air quality estimates, especially during high pollution events, which is beneficial for long-term exposure analysis. Future research should explore more sophisticated NN architectures with spatial and temporal variations in PM<sub>2.5</sub> to improve the model performance.https://www.mdpi.com/2073-4433/16/1/48machine learningair qualityartificial neural networkMERRA-2 reanalysishigh pollution events
spellingShingle	Shijin Wei Kyle Shores Yangyang Xu A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution Events Atmosphere machine learning air quality artificial neural network MERRA-2 reanalysis high pollution events
title	A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution Events
title_full	A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution Events
title_fullStr	A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution Events
title_full_unstemmed	A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution Events
title_short	A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution Events
title_sort	comparison of machine learning based approaches in estimating surface pm sub 2 5 sub concentrations focusing on artificial neural networks and high pollution events
topic	machine learning air quality artificial neural network MERRA-2 reanalysis high pollution events
url	https://www.mdpi.com/2073-4433/16/1/48
work_keys_str_mv	AT shijinwei acomparisonofmachinelearningbasedapproachesinestimatingsurfacepmsub25subconcentrationsfocusingonartificialneuralnetworksandhighpollutionevents AT kyleshores acomparisonofmachinelearningbasedapproachesinestimatingsurfacepmsub25subconcentrationsfocusingonartificialneuralnetworksandhighpollutionevents AT yangyangxu acomparisonofmachinelearningbasedapproachesinestimatingsurfacepmsub25subconcentrationsfocusingonartificialneuralnetworksandhighpollutionevents AT shijinwei comparisonofmachinelearningbasedapproachesinestimatingsurfacepmsub25subconcentrationsfocusingonartificialneuralnetworksandhighpollutionevents AT kyleshores comparisonofmachinelearningbasedapproachesinestimatingsurfacepmsub25subconcentrationsfocusingonartificialneuralnetworksandhighpollutionevents AT yangyangxu comparisonofmachinelearningbasedapproachesinestimatingsurfacepmsub25subconcentrationsfocusingonartificialneuralnetworksandhighpollutionevents

A Comparison of Machine Learning-Based Approaches in Estimating Surface PM<sub>2.5</sub> Concentrations Focusing on Artificial Neural Networks and High Pollution Events

Similar Items