Comparative analysis of correlation and causality inference in water quality problems with emphasis on TDS Karkheh River in Iran

Abstract Water quality management is a critical aspect of environmental sustainability, particularly in arid and semi-arid regions such as Iran where water scarcity is compounded by quality degradation. This study delves into the causal relationships influencing water quality, focusing on Total Diss...

Full description

Saved in:
Bibliographic Details
Main Authors: Reza Shakeri, Hossein Amini, Farshid Fakheri, Man Yue Lam, Banafsheh Zahraie
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-85908-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585768209481728
author Reza Shakeri
Hossein Amini
Farshid Fakheri
Man Yue Lam
Banafsheh Zahraie
author_facet Reza Shakeri
Hossein Amini
Farshid Fakheri
Man Yue Lam
Banafsheh Zahraie
author_sort Reza Shakeri
collection DOAJ
description Abstract Water quality management is a critical aspect of environmental sustainability, particularly in arid and semi-arid regions such as Iran where water scarcity is compounded by quality degradation. This study delves into the causal relationships influencing water quality, focusing on Total Dissolved Solids (TDS) as a primary indicator in the Karkheh River, southwest Iran. Utilizing a comprehensive dataset spanning 50 years (1968–2018), this research integrates Machine Learning (ML) techniques to examine correlations and infer causality among multiple parameters, including flow rate (Q), Sodium (Na+), Magnesium (Mg2+), Calcium (Ca2+), Chloride (Cl−), Sulfate (SO4 2−), Bicarbonates (HCO3 −), and pH. For modeling the causation, the “Back door linear regression” approach has been considered which establishes a stable and interpretable framework in causal inference by focusing on clear assumptions. Predictive modeling was used to show the difference between correlation and causation along with interpretability modeling to make the predictive model transparent. Predictive modeling does not report the causality among the variables as it showed Mg is not contributing to the target (TDS) while the findings reveal that TDS is predominantly positive influenced by Mg, Na, Cl, Ca and SO4, with HCO3 and pH exerting negative (inverse) effects. Unlike correlations, causal relationships demonstrate directional and often unequal influences, highlighting Mg as a critical driver of TDS levels. This novel application of ML-based causal inference in water quality research provides a cost-effective and time-efficient alternative to traditional experimental methods. The results underscore the potential of ML-driven causal analysis to guide water resource management and policy-making. By identifying the key drivers of TDS, this study proposes targeted interventions to mitigate water quality deterioration. Moreover, the insights gained lay the foundation for developing early warning systems, ensuring proactive and sustainable water quality management in similar hydrological contexts.
format Article
id doaj-art-424f11d4a9ee42ab82b8116c79346d93
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-424f11d4a9ee42ab82b8116c79346d932025-01-26T12:31:52ZengNature PortfolioScientific Reports2045-23222025-01-0115111710.1038/s41598-025-85908-0Comparative analysis of correlation and causality inference in water quality problems with emphasis on TDS Karkheh River in IranReza Shakeri0Hossein Amini1Farshid Fakheri2Man Yue Lam3Banafsheh Zahraie4School of Civil Engineering, College of Engineering, University of TehranSchool of Engineering, Cardiff UniversityDepartment of Civil and Environmental Engineering, Amirkabir University of TechnologySchool of Engineering, Cardiff UniversitySchool of Civil Engineering, College of Engineering, University of TehranAbstract Water quality management is a critical aspect of environmental sustainability, particularly in arid and semi-arid regions such as Iran where water scarcity is compounded by quality degradation. This study delves into the causal relationships influencing water quality, focusing on Total Dissolved Solids (TDS) as a primary indicator in the Karkheh River, southwest Iran. Utilizing a comprehensive dataset spanning 50 years (1968–2018), this research integrates Machine Learning (ML) techniques to examine correlations and infer causality among multiple parameters, including flow rate (Q), Sodium (Na+), Magnesium (Mg2+), Calcium (Ca2+), Chloride (Cl−), Sulfate (SO4 2−), Bicarbonates (HCO3 −), and pH. For modeling the causation, the “Back door linear regression” approach has been considered which establishes a stable and interpretable framework in causal inference by focusing on clear assumptions. Predictive modeling was used to show the difference between correlation and causation along with interpretability modeling to make the predictive model transparent. Predictive modeling does not report the causality among the variables as it showed Mg is not contributing to the target (TDS) while the findings reveal that TDS is predominantly positive influenced by Mg, Na, Cl, Ca and SO4, with HCO3 and pH exerting negative (inverse) effects. Unlike correlations, causal relationships demonstrate directional and often unequal influences, highlighting Mg as a critical driver of TDS levels. This novel application of ML-based causal inference in water quality research provides a cost-effective and time-efficient alternative to traditional experimental methods. The results underscore the potential of ML-driven causal analysis to guide water resource management and policy-making. By identifying the key drivers of TDS, this study proposes targeted interventions to mitigate water quality deterioration. Moreover, the insights gained lay the foundation for developing early warning systems, ensuring proactive and sustainable water quality management in similar hydrological contexts.https://doi.org/10.1038/s41598-025-85908-0Water qualityMachine learningCausality inferenceCorrelationRiverTDS
spellingShingle Reza Shakeri
Hossein Amini
Farshid Fakheri
Man Yue Lam
Banafsheh Zahraie
Comparative analysis of correlation and causality inference in water quality problems with emphasis on TDS Karkheh River in Iran
Scientific Reports
Water quality
Machine learning
Causality inference
Correlation
River
TDS
title Comparative analysis of correlation and causality inference in water quality problems with emphasis on TDS Karkheh River in Iran
title_full Comparative analysis of correlation and causality inference in water quality problems with emphasis on TDS Karkheh River in Iran
title_fullStr Comparative analysis of correlation and causality inference in water quality problems with emphasis on TDS Karkheh River in Iran
title_full_unstemmed Comparative analysis of correlation and causality inference in water quality problems with emphasis on TDS Karkheh River in Iran
title_short Comparative analysis of correlation and causality inference in water quality problems with emphasis on TDS Karkheh River in Iran
title_sort comparative analysis of correlation and causality inference in water quality problems with emphasis on tds karkheh river in iran
topic Water quality
Machine learning
Causality inference
Correlation
River
TDS
url https://doi.org/10.1038/s41598-025-85908-0
work_keys_str_mv AT rezashakeri comparativeanalysisofcorrelationandcausalityinferenceinwaterqualityproblemswithemphasisontdskarkhehriveriniran
AT hosseinamini comparativeanalysisofcorrelationandcausalityinferenceinwaterqualityproblemswithemphasisontdskarkhehriveriniran
AT farshidfakheri comparativeanalysisofcorrelationandcausalityinferenceinwaterqualityproblemswithemphasisontdskarkhehriveriniran
AT manyuelam comparativeanalysisofcorrelationandcausalityinferenceinwaterqualityproblemswithemphasisontdskarkhehriveriniran
AT banafshehzahraie comparativeanalysisofcorrelationandcausalityinferenceinwaterqualityproblemswithemphasisontdskarkhehriveriniran