Spatiotemporal water quality data reconstruction: A tensor factorization framework
Automatic high-frequency monitoring (AHFM) of water quality parameters has gained growing attention for managing eutrophic lakes. However, missing data in water quality datasets remains a persistent challenge, often compromising the reliability of mathematical models and statistical analyses. While...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-12-01
|
| Series: | Ecological Informatics |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1574954125002924 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Automatic high-frequency monitoring (AHFM) of water quality parameters has gained growing attention for managing eutrophic lakes. However, missing data in water quality datasets remains a persistent challenge, often compromising the reliability of mathematical models and statistical analyses. While traditional imputation methods fail to adequately capture complex spatiotemporal dependencies among water quality variables, this study proposes a novel nonnegative tensor factorization (NTF) model designed to reconstruct missing values by effectively modeling variable-site-time triad interactions. Previous findings indicate that incorporating bias schemes into NTF architectures substantially reduces underfitting risks. Leveraging this insight, we develop and rigorously evaluate seven distinct biased NTF variants. Their diversified bias term designs not only enhance individual model performance but also enable highly effective ensemble learning through complementary strengths. To validate the proposed models, we conduct comprehensive experiments using real-world AHFM data from Lake Dianchi, China, under various missing data scenarios (20–80 % missing ratios and 1–4 weeks missing gaps). The key water quality parameters include chlorophyll-a concentration, water temperature, pH, dissolved oxygen, electrical conductivity, turbidity, chemical oxygen demand, ammonia, total phosphorus, and total nitrogen. The results demonstrate the superiority of the seven biased NTF models, achieving optimal performance with a root mean squared error (RMSE) of 0.2796 ± 0.0041, mean absolute error (MAE) of 0.1611 ± 0.0034, and Nash-Sutcliffe efficiency (NSE) of 0.9704 ± 0.0009 across all missingness scenarios. Compared to state-of-the-art models, these methods yield consistent improvements of 3.42 %–30.74 % in RMSE, 2.30 %–30.38 % in MAE, and 0.20 %–3.22 % in NSE. Notably, an ensemble of the seven models further elevates imputation accuracy, attaining an RMSE of 0.2409 ± 0.0018, MAE of 0.1384 ± 0.0012, and NSE of 0.9768 ± 0.0009. These findings underscore the potential of bias-enhanced NTF frameworks as a robust tool for analyzing high-dimensional monitoring data. |
|---|---|
| ISSN: | 1574-9541 |