An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues

Abstract Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the...

Full description

Saved in:
Bibliographic Details
Main Authors: Ashifur Rahman, M. M. Mahbubul Syeed, Md. Rajaul Karim, Kaniz Fatema, Razib Hayat Khan, Mohammad Faisal Uddin
Format: Article
Language:English
Published: SpringerOpen 2025-04-01
Series:Applied Water Science
Subjects:
Online Access:https://doi.org/10.1007/s13201-025-02450-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849325881996083200
author Ashifur Rahman
M. M. Mahbubul Syeed
Md. Rajaul Karim
Kaniz Fatema
Razib Hayat Khan
Mohammad Faisal Uddin
author_facet Ashifur Rahman
M. M. Mahbubul Syeed
Md. Rajaul Karim
Kaniz Fatema
Razib Hayat Khan
Mohammad Faisal Uddin
author_sort Ashifur Rahman
collection DOAJ
description Abstract Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the model, often triggered by eclipsing and ambiguity issues. In addressing these, recently, data-driven approaches through the integration of machine learning or deep learning (ML/DL) techniques are notably applied to develop improved WQI models. Although these models perform better than the conventional ones, recent studies have reported that the proposed approaches often produce inconsistent results due to data variability and outliers. The purpose of this research is to define a robust and reliable ensemble ML-WQI model that is optimized to attenuate the effect of data variability, eclipsing, and ambiguity issues for accurate water quality prediction. To define the ensemble model, eight prominent regression ML models are used to select the best-performing base-estimators and the meta-learner. The Irish WQI dataset used in the study includes 29,159 samples spanning over 15 years. Each data sample records 11 (eleven) water quality parameters and the corresponding measurement and classification of WQI, calculated using three traditional WQI models, namely, CCME, Brown, and SRDD. To evaluate performance, mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared ( $$R^2$$ R 2 ), fivefold cross-validation, and a comparative evaluation with existing ML models are carried out. In addition, resilience to eclipsing, ambiguity, and outliers is quantitatively assessed using the WQI classification data. The findings revealed that the ensemble ML-WQI model with linear regression (LR), random forest (RF), and extreme gradient boosting (XGB) as base-estimators, and decision tree (DT) as the meta-learner, achieves high classification accuracy with MAE, MSE, RMSE, and $$R^2$$ R 2 scores of 0.01, 0.001, 0.0034, and 1.00, respectively. This performance measure is better than the existing regression-based ML-WQI models. In addition, the model shows greater resilience to outliers by classifying all WQIs close to the general trend of water quality. The model has a very low eclipsing effect (23.9%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system.
format Article
id doaj-art-358b9531acf14a41a1c0c28a72deb2b2
institution Kabale University
issn 2190-5487
2190-5495
language English
publishDate 2025-04-01
publisher SpringerOpen
record_format Article
series Applied Water Science
spelling doaj-art-358b9531acf14a41a1c0c28a72deb2b22025-08-20T03:48:18ZengSpringerOpenApplied Water Science2190-54872190-54952025-04-0115512710.1007/s13201-025-02450-0An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issuesAshifur Rahman0M. M. Mahbubul Syeed1Md. Rajaul Karim2Kaniz Fatema3Razib Hayat Khan4Mohammad Faisal Uddin5RIoT Research Center, Independent UniversityDepartment of Computer Science and Engineering, Independent UniversityRIoT Research Center, Independent UniversityDepartment of Computer Science and Engineering, Independent UniversityDepartment of Computer Science and Engineering, Independent UniversityDepartment of Computer Science and Engineering, Independent UniversityAbstract Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the model, often triggered by eclipsing and ambiguity issues. In addressing these, recently, data-driven approaches through the integration of machine learning or deep learning (ML/DL) techniques are notably applied to develop improved WQI models. Although these models perform better than the conventional ones, recent studies have reported that the proposed approaches often produce inconsistent results due to data variability and outliers. The purpose of this research is to define a robust and reliable ensemble ML-WQI model that is optimized to attenuate the effect of data variability, eclipsing, and ambiguity issues for accurate water quality prediction. To define the ensemble model, eight prominent regression ML models are used to select the best-performing base-estimators and the meta-learner. The Irish WQI dataset used in the study includes 29,159 samples spanning over 15 years. Each data sample records 11 (eleven) water quality parameters and the corresponding measurement and classification of WQI, calculated using three traditional WQI models, namely, CCME, Brown, and SRDD. To evaluate performance, mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared ( $$R^2$$ R 2 ), fivefold cross-validation, and a comparative evaluation with existing ML models are carried out. In addition, resilience to eclipsing, ambiguity, and outliers is quantitatively assessed using the WQI classification data. The findings revealed that the ensemble ML-WQI model with linear regression (LR), random forest (RF), and extreme gradient boosting (XGB) as base-estimators, and decision tree (DT) as the meta-learner, achieves high classification accuracy with MAE, MSE, RMSE, and $$R^2$$ R 2 scores of 0.01, 0.001, 0.0034, and 1.00, respectively. This performance measure is better than the existing regression-based ML-WQI models. In addition, the model shows greater resilience to outliers by classifying all WQIs close to the general trend of water quality. The model has a very low eclipsing effect (23.9%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system.https://doi.org/10.1007/s13201-025-02450-0Water quality (WQ)Water quality index (WQI)Ensemble modelWater quality predictionEclipsingAmbiguity
spellingShingle Ashifur Rahman
M. M. Mahbubul Syeed
Md. Rajaul Karim
Kaniz Fatema
Razib Hayat Khan
Mohammad Faisal Uddin
An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
Applied Water Science
Water quality (WQ)
Water quality index (WQI)
Ensemble model
Water quality prediction
Eclipsing
Ambiguity
title An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
title_full An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
title_fullStr An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
title_full_unstemmed An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
title_short An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
title_sort optimized ensemble ml wqi model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
topic Water quality (WQ)
Water quality index (WQI)
Ensemble model
Water quality prediction
Eclipsing
Ambiguity
url https://doi.org/10.1007/s13201-025-02450-0
work_keys_str_mv AT ashifurrahman anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT mmmahbubulsyeed anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT mdrajaulkarim anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT kanizfatema anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT razibhayatkhan anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT mohammadfaisaluddin anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT ashifurrahman optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT mmmahbubulsyeed optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT mdrajaulkarim optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT kanizfatema optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT razibhayatkhan optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues
AT mohammadfaisaluddin optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues