Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning

Identifying and differentiating human activities is crucial for effectively preventing the threats posed by environmental pollution to aquatic ecosystems and human health. Machine learning (ML) is a powerful analytical tool for tracking human impacts on river ecosystems based on high-through dataset...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhangmu Jing, Yi Zhang, Xiaoling Liu, Qingqian Li, Yanji Hao, Yeqing Li, Hongjie Gao
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Environment International
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0160412024008274
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589999110881280
author Zhangmu Jing
Yi Zhang
Xiaoling Liu
Qingqian Li
Yanji Hao
Yeqing Li
Hongjie Gao
author_facet Zhangmu Jing
Yi Zhang
Xiaoling Liu
Qingqian Li
Yanji Hao
Yeqing Li
Hongjie Gao
author_sort Zhangmu Jing
collection DOAJ
description Identifying and differentiating human activities is crucial for effectively preventing the threats posed by environmental pollution to aquatic ecosystems and human health. Machine learning (ML) is a powerful analytical tool for tracking human impacts on river ecosystems based on high-through datasets. This study employed an ML framework and 16S rRNA sequencing data to reveal microbial dynamics and trace human activities across China. The results revealed that the microbial assembly was mainly dominated by deterministic factors (environmental factors and interactions between species), and the metacommunity partition was significantly associated with human activities in both water and sediment (Chi-square testw P = 1.93 × 10-22; Chi-square tests P = 6.00 × 10-6). Human activities increased the vulnerability of interspecific occurrence networks and the influence of environmental factors on the OTUs similarity and phylogenetic distance. Combined of microbiological indices (MBIs), microbial relative abundance (MRA), and environmental and geographical indices (EGIs), the source classifier machine learning (SCML) algorithm was used to categorize five human activities (i.e., low human-impact, agricultural inputs, domestic inputs, industrial inputs, and dam construction). The SCML optimal configuration is (MBIs + MRA + EGIs) exhibited strong performance with TestW R2 of 0.882 and TestS R2 of 0.924. This study provides valuable insights for improving ecosystem management, supporting sustainable water resource management and advancing pollution mitigation efforts.
format Article
id doaj-art-add67d3f65084b6f89fa12b5407a2e0f
institution Kabale University
issn 0160-4120
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Environment International
spelling doaj-art-add67d3f65084b6f89fa12b5407a2e0f2025-01-24T04:44:11ZengElsevierEnvironment International0160-41202025-01-01195109240Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learningZhangmu Jing0Yi Zhang1Xiaoling Liu2Qingqian Li3Yanji Hao4Yeqing Li5Hongjie Gao6State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Science, Beijing 100012, China; State Environmental Protection Key Laboratory of Estuarine and Coastal Environment, Chinese Research Academy of Environmental Science, Beijing 100012, China; State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China; School of Civil and Environmental Engineering, Nanyang Technological University, 639798, SingaporeState Key Joint Laboratory of Environment Simulation and Pollution Control (SKLESPC), School of Environment, Tsinghua University, Beijing 100084, ChinaState Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Science, Beijing 100012, China; State Environmental Protection Key Laboratory of Estuarine and Coastal Environment, Chinese Research Academy of Environmental Science, Beijing 100012, ChinaState Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Science, Beijing 100012, China; State Environmental Protection Key Laboratory of Estuarine and Coastal Environment, Chinese Research Academy of Environmental Science, Beijing 100012, ChinaState Key Laboratory of Heavy Oil Processing, Beijing Key Laboratory of Biogas Upgrading Utilization, College of New Energy and Materials, China University of Petroleum Beijing (CUPB), Beijing, 102249, ChinaState Key Laboratory of Heavy Oil Processing, Beijing Key Laboratory of Biogas Upgrading Utilization, College of New Energy and Materials, China University of Petroleum Beijing (CUPB), Beijing, 102249, ChinaState Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Science, Beijing 100012, China; State Environmental Protection Key Laboratory of Estuarine and Coastal Environment, Chinese Research Academy of Environmental Science, Beijing 100012, China; State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China; Corresponding author.Identifying and differentiating human activities is crucial for effectively preventing the threats posed by environmental pollution to aquatic ecosystems and human health. Machine learning (ML) is a powerful analytical tool for tracking human impacts on river ecosystems based on high-through datasets. This study employed an ML framework and 16S rRNA sequencing data to reveal microbial dynamics and trace human activities across China. The results revealed that the microbial assembly was mainly dominated by deterministic factors (environmental factors and interactions between species), and the metacommunity partition was significantly associated with human activities in both water and sediment (Chi-square testw P = 1.93 × 10-22; Chi-square tests P = 6.00 × 10-6). Human activities increased the vulnerability of interspecific occurrence networks and the influence of environmental factors on the OTUs similarity and phylogenetic distance. Combined of microbiological indices (MBIs), microbial relative abundance (MRA), and environmental and geographical indices (EGIs), the source classifier machine learning (SCML) algorithm was used to categorize five human activities (i.e., low human-impact, agricultural inputs, domestic inputs, industrial inputs, and dam construction). The SCML optimal configuration is (MBIs + MRA + EGIs) exhibited strong performance with TestW R2 of 0.882 and TestS R2 of 0.924. This study provides valuable insights for improving ecosystem management, supporting sustainable water resource management and advancing pollution mitigation efforts.http://www.sciencedirect.com/science/article/pii/S0160412024008274Source classifier machine learningMicrobial communities16S rRNA sequencing dataHuman activitiesPollution source tracing
spellingShingle Zhangmu Jing
Yi Zhang
Xiaoling Liu
Qingqian Li
Yanji Hao
Yeqing Li
Hongjie Gao
Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning
Environment International
Source classifier machine learning
Microbial communities
16S rRNA sequencing data
Human activities
Pollution source tracing
title Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning
title_full Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning
title_fullStr Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning
title_full_unstemmed Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning
title_short Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning
title_sort identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning
topic Source classifier machine learning
Microbial communities
16S rRNA sequencing data
Human activities
Pollution source tracing
url http://www.sciencedirect.com/science/article/pii/S0160412024008274
work_keys_str_mv AT zhangmujing identifyinghumanactivitiescausingwaterpollutionbasedonmicrobialcommunitysequencingandsourceclassifiermachinelearning
AT yizhang identifyinghumanactivitiescausingwaterpollutionbasedonmicrobialcommunitysequencingandsourceclassifiermachinelearning
AT xiaolingliu identifyinghumanactivitiescausingwaterpollutionbasedonmicrobialcommunitysequencingandsourceclassifiermachinelearning
AT qingqianli identifyinghumanactivitiescausingwaterpollutionbasedonmicrobialcommunitysequencingandsourceclassifiermachinelearning
AT yanjihao identifyinghumanactivitiescausingwaterpollutionbasedonmicrobialcommunitysequencingandsourceclassifiermachinelearning
AT yeqingli identifyinghumanactivitiescausingwaterpollutionbasedonmicrobialcommunitysequencingandsourceclassifiermachinelearning
AT hongjiegao identifyinghumanactivitiescausingwaterpollutionbasedonmicrobialcommunitysequencingandsourceclassifiermachinelearning