The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution

Air pollution is one of humanity's most critical environmental issues and is considered contentious in several countries worldwide. As a result, accurate prediction is critical in human health management and government decision-making for environmental management. In this study, three artificia...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohamed Khalid AlOmar, Faidhalrahman Khaleel, Abdulwahab Abdulrazaaq AlSaadi, Mohammed Majeed Hameed, Mohammed Abdulhakim AlSaadi, Nadhir Al-Ansari
Format: Article
Language:English
Published: Wiley 2022-01-01
Series:Advances in Meteorology
Online Access:http://dx.doi.org/10.1155/2022/5346647
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832567747111813120
author Mohamed Khalid AlOmar
Faidhalrahman Khaleel
Abdulwahab Abdulrazaaq AlSaadi
Mohammed Majeed Hameed
Mohammed Abdulhakim AlSaadi
Nadhir Al-Ansari
author_facet Mohamed Khalid AlOmar
Faidhalrahman Khaleel
Abdulwahab Abdulrazaaq AlSaadi
Mohammed Majeed Hameed
Mohammed Abdulhakim AlSaadi
Nadhir Al-Ansari
author_sort Mohamed Khalid AlOmar
collection DOAJ
description Air pollution is one of humanity's most critical environmental issues and is considered contentious in several countries worldwide. As a result, accurate prediction is critical in human health management and government decision-making for environmental management. In this study, three artificial intelligence (AI) approaches, namely group method of data handling neural network (GMDHNN), extreme learning machine (ELM), and gradient boosting regression (GBR) tree, are used to predict the hourly concentration of PM2.5 over a Dorset station located in Canada. The investigation has been performed to quantify the effect of data length on the AI modeling performance. Accordingly, nine different ratios (50/50, 55/45, 60/40, 65/35, 70/30, 75/25, 80/20, 85/15, and 90/10) are employed to split the data into training and testing datasets for assessing the performance of applied models. The results showed that the data division significantly impacted the model's capacity, and the 60/40 ratio was found more suitable for developing predictive models. Furthermore, the results showed that the ELM model provides more precise predictions of PM2.5 concentrations than the other models. Also, a vital feature of the ELM model is its ability to adapt to the potential changes in training and testing data ratio. To summarize, the results reported in this study demonstrated an efficient method for selecting the optimal dataset ratios and the best AI model to predict properly which would be helpful in the design of an accurate model for solving different environmental issues.
format Article
id doaj-art-a6158c5545f04092a39f9e002445559a
institution Kabale University
issn 1687-9317
language English
publishDate 2022-01-01
publisher Wiley
record_format Article
series Advances in Meteorology
spelling doaj-art-a6158c5545f04092a39f9e002445559a2025-02-03T01:00:45ZengWileyAdvances in Meteorology1687-93172022-01-01202210.1155/2022/5346647The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air PollutionMohamed Khalid AlOmar0Faidhalrahman Khaleel1Abdulwahab Abdulrazaaq AlSaadi2Mohammed Majeed Hameed3Mohammed Abdulhakim AlSaadi4Nadhir Al-Ansari5Department of Civil EngineeringDepartment of Civil EngineeringDepartment of Computer Engineering TechnicsDepartment of Civil EngineeringNatural and Medical Sciences Research CenterCivil Engineering DepartmentAir pollution is one of humanity's most critical environmental issues and is considered contentious in several countries worldwide. As a result, accurate prediction is critical in human health management and government decision-making for environmental management. In this study, three artificial intelligence (AI) approaches, namely group method of data handling neural network (GMDHNN), extreme learning machine (ELM), and gradient boosting regression (GBR) tree, are used to predict the hourly concentration of PM2.5 over a Dorset station located in Canada. The investigation has been performed to quantify the effect of data length on the AI modeling performance. Accordingly, nine different ratios (50/50, 55/45, 60/40, 65/35, 70/30, 75/25, 80/20, 85/15, and 90/10) are employed to split the data into training and testing datasets for assessing the performance of applied models. The results showed that the data division significantly impacted the model's capacity, and the 60/40 ratio was found more suitable for developing predictive models. Furthermore, the results showed that the ELM model provides more precise predictions of PM2.5 concentrations than the other models. Also, a vital feature of the ELM model is its ability to adapt to the potential changes in training and testing data ratio. To summarize, the results reported in this study demonstrated an efficient method for selecting the optimal dataset ratios and the best AI model to predict properly which would be helpful in the design of an accurate model for solving different environmental issues.http://dx.doi.org/10.1155/2022/5346647
spellingShingle Mohamed Khalid AlOmar
Faidhalrahman Khaleel
Abdulwahab Abdulrazaaq AlSaadi
Mohammed Majeed Hameed
Mohammed Abdulhakim AlSaadi
Nadhir Al-Ansari
The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution
Advances in Meteorology
title The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution
title_full The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution
title_fullStr The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution
title_full_unstemmed The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution
title_short The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution
title_sort influence of data length on the performance of artificial intelligence models in predicting air pollution
url http://dx.doi.org/10.1155/2022/5346647
work_keys_str_mv AT mohamedkhalidalomar theinfluenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT faidhalrahmankhaleel theinfluenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT abdulwahababdulrazaaqalsaadi theinfluenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT mohammedmajeedhameed theinfluenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT mohammedabdulhakimalsaadi theinfluenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT nadhiralansari theinfluenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT mohamedkhalidalomar influenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT faidhalrahmankhaleel influenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT abdulwahababdulrazaaqalsaadi influenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT mohammedmajeedhameed influenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT mohammedabdulhakimalsaadi influenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution
AT nadhiralansari influenceofdatalengthontheperformanceofartificialintelligencemodelsinpredictingairpollution