An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
Abstract Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2025-04-01
|
| Series: | Applied Water Science |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s13201-025-02450-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849325881996083200 |
|---|---|
| author | Ashifur Rahman M. M. Mahbubul Syeed Md. Rajaul Karim Kaniz Fatema Razib Hayat Khan Mohammad Faisal Uddin |
| author_facet | Ashifur Rahman M. M. Mahbubul Syeed Md. Rajaul Karim Kaniz Fatema Razib Hayat Khan Mohammad Faisal Uddin |
| author_sort | Ashifur Rahman |
| collection | DOAJ |
| description | Abstract Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the model, often triggered by eclipsing and ambiguity issues. In addressing these, recently, data-driven approaches through the integration of machine learning or deep learning (ML/DL) techniques are notably applied to develop improved WQI models. Although these models perform better than the conventional ones, recent studies have reported that the proposed approaches often produce inconsistent results due to data variability and outliers. The purpose of this research is to define a robust and reliable ensemble ML-WQI model that is optimized to attenuate the effect of data variability, eclipsing, and ambiguity issues for accurate water quality prediction. To define the ensemble model, eight prominent regression ML models are used to select the best-performing base-estimators and the meta-learner. The Irish WQI dataset used in the study includes 29,159 samples spanning over 15 years. Each data sample records 11 (eleven) water quality parameters and the corresponding measurement and classification of WQI, calculated using three traditional WQI models, namely, CCME, Brown, and SRDD. To evaluate performance, mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared ( $$R^2$$ R 2 ), fivefold cross-validation, and a comparative evaluation with existing ML models are carried out. In addition, resilience to eclipsing, ambiguity, and outliers is quantitatively assessed using the WQI classification data. The findings revealed that the ensemble ML-WQI model with linear regression (LR), random forest (RF), and extreme gradient boosting (XGB) as base-estimators, and decision tree (DT) as the meta-learner, achieves high classification accuracy with MAE, MSE, RMSE, and $$R^2$$ R 2 scores of 0.01, 0.001, 0.0034, and 1.00, respectively. This performance measure is better than the existing regression-based ML-WQI models. In addition, the model shows greater resilience to outliers by classifying all WQIs close to the general trend of water quality. The model has a very low eclipsing effect (23.9%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system. |
| format | Article |
| id | doaj-art-358b9531acf14a41a1c0c28a72deb2b2 |
| institution | Kabale University |
| issn | 2190-5487 2190-5495 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | SpringerOpen |
| record_format | Article |
| series | Applied Water Science |
| spelling | doaj-art-358b9531acf14a41a1c0c28a72deb2b22025-08-20T03:48:18ZengSpringerOpenApplied Water Science2190-54872190-54952025-04-0115512710.1007/s13201-025-02450-0An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issuesAshifur Rahman0M. M. Mahbubul Syeed1Md. Rajaul Karim2Kaniz Fatema3Razib Hayat Khan4Mohammad Faisal Uddin5RIoT Research Center, Independent UniversityDepartment of Computer Science and Engineering, Independent UniversityRIoT Research Center, Independent UniversityDepartment of Computer Science and Engineering, Independent UniversityDepartment of Computer Science and Engineering, Independent UniversityDepartment of Computer Science and Engineering, Independent UniversityAbstract Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the model, often triggered by eclipsing and ambiguity issues. In addressing these, recently, data-driven approaches through the integration of machine learning or deep learning (ML/DL) techniques are notably applied to develop improved WQI models. Although these models perform better than the conventional ones, recent studies have reported that the proposed approaches often produce inconsistent results due to data variability and outliers. The purpose of this research is to define a robust and reliable ensemble ML-WQI model that is optimized to attenuate the effect of data variability, eclipsing, and ambiguity issues for accurate water quality prediction. To define the ensemble model, eight prominent regression ML models are used to select the best-performing base-estimators and the meta-learner. The Irish WQI dataset used in the study includes 29,159 samples spanning over 15 years. Each data sample records 11 (eleven) water quality parameters and the corresponding measurement and classification of WQI, calculated using three traditional WQI models, namely, CCME, Brown, and SRDD. To evaluate performance, mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared ( $$R^2$$ R 2 ), fivefold cross-validation, and a comparative evaluation with existing ML models are carried out. In addition, resilience to eclipsing, ambiguity, and outliers is quantitatively assessed using the WQI classification data. The findings revealed that the ensemble ML-WQI model with linear regression (LR), random forest (RF), and extreme gradient boosting (XGB) as base-estimators, and decision tree (DT) as the meta-learner, achieves high classification accuracy with MAE, MSE, RMSE, and $$R^2$$ R 2 scores of 0.01, 0.001, 0.0034, and 1.00, respectively. This performance measure is better than the existing regression-based ML-WQI models. In addition, the model shows greater resilience to outliers by classifying all WQIs close to the general trend of water quality. The model has a very low eclipsing effect (23.9%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system.https://doi.org/10.1007/s13201-025-02450-0Water quality (WQ)Water quality index (WQI)Ensemble modelWater quality predictionEclipsingAmbiguity |
| spellingShingle | Ashifur Rahman M. M. Mahbubul Syeed Md. Rajaul Karim Kaniz Fatema Razib Hayat Khan Mohammad Faisal Uddin An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues Applied Water Science Water quality (WQ) Water quality index (WQI) Ensemble model Water quality prediction Eclipsing Ambiguity |
| title | An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues |
| title_full | An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues |
| title_fullStr | An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues |
| title_full_unstemmed | An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues |
| title_short | An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues |
| title_sort | optimized ensemble ml wqi model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues |
| topic | Water quality (WQ) Water quality index (WQI) Ensemble model Water quality prediction Eclipsing Ambiguity |
| url | https://doi.org/10.1007/s13201-025-02450-0 |
| work_keys_str_mv | AT ashifurrahman anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT mmmahbubulsyeed anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT mdrajaulkarim anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT kanizfatema anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT razibhayatkhan anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT mohammadfaisaluddin anoptimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT ashifurrahman optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT mmmahbubulsyeed optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT mdrajaulkarim optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT kanizfatema optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT razibhayatkhan optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues AT mohammadfaisaluddin optimizedensemblemlwqimodelforreliablewaterqualitypredictionbyminimizingtheeclipsingandambiguityissues |