Integrating Genetic Algorithm and Geographically Weighted Approaches into Machine Learning Improves Soil pH Prediction in China

Accurate soil pH prediction is critical for soil management and ecological environmental protection. Machine learning (ML) models have been widely applied in the field of soil pH prediction. However, when using these models, the spatial heterogeneity of the relationship between soil and environmenta...

Full description

Saved in:
Bibliographic Details
Main Authors: Wantao Zhang, Jingyi Ji, Binbin Li, Xiao Deng, Mingxiang Xu
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/6/1086
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849339768913002496
author Wantao Zhang
Jingyi Ji
Binbin Li
Xiao Deng
Mingxiang Xu
author_facet Wantao Zhang
Jingyi Ji
Binbin Li
Xiao Deng
Mingxiang Xu
author_sort Wantao Zhang
collection DOAJ
description Accurate soil pH prediction is critical for soil management and ecological environmental protection. Machine learning (ML) models have been widely applied in the field of soil pH prediction. However, when using these models, the spatial heterogeneity of the relationship between soil and environmental variables is often not fully considered, which limits the predictive capability of the models, especially in large-scale regions with complex soil landscapes. To address these challenges, this study collected soil pH data from 4335 soil surface points (0–20 cm) obtained from the China Soil System Survey, combined with a multi-source environmental covariate. This study integrates Geographic Weighted Regression (GWR) with three ML models (Random Forest, Cubist, and XGBoost) and designs and develops three geographically weighted machine learning models optimized by Genetic Algorithms to improve the prediction of soil pH values. Compared to GWR and traditional ML models, the R<sup>2</sup> of the geographic weighted random forest (GWRF), geographic weighted Cubist (GWCubist), and geographic weighted extreme gradient boosting (GWXGBoost) models increased by 1.98% to 14.29%, while the RMSE decreased by 1.81% to 11.98%. Among the three models, the GWRF model performed the best and effectively reduced uncertainty in soil pH mapping. Mean Annual Precipitation and the Normalized Difference Vegetation Index are two key environmental variables influencing the prediction of soil pH, and they have a significant negative impact on the spatial distribution of soil pH. These findings provide a scientific basis for effective soil health management and the implementation of large-scale soil modeling programs.
format Article
id doaj-art-3bec9f9cf7cd4acdaf270af00a2e4fec
institution Kabale University
issn 2072-4292
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-3bec9f9cf7cd4acdaf270af00a2e4fec2025-08-20T03:44:03ZengMDPI AGRemote Sensing2072-42922025-03-01176108610.3390/rs17061086Integrating Genetic Algorithm and Geographically Weighted Approaches into Machine Learning Improves Soil pH Prediction in ChinaWantao Zhang0Jingyi Ji1Binbin Li2Xiao Deng3Mingxiang Xu4College of Soil and Water Conservation Science and Engineering, Northwest A&F University, Yangling 712100, ChinaState Key Laboratory of Soil Erosion and Dryland Farming on the Loess Plateau, The Research Center of Soil and Water Conservation and Ecological Environment, Chinese Academy of Sciences and Ministry of Education, Yangling 712100, ChinaCollege of Soil and Water Conservation Science and Engineering, Northwest A&F University, Yangling 712100, ChinaState Key Laboratory of Soil Erosion and Dryland Farming on the Loess Plateau, The Research Center of Soil and Water Conservation and Ecological Environment, Chinese Academy of Sciences and Ministry of Education, Yangling 712100, ChinaCollege of Soil and Water Conservation Science and Engineering, Northwest A&F University, Yangling 712100, ChinaAccurate soil pH prediction is critical for soil management and ecological environmental protection. Machine learning (ML) models have been widely applied in the field of soil pH prediction. However, when using these models, the spatial heterogeneity of the relationship between soil and environmental variables is often not fully considered, which limits the predictive capability of the models, especially in large-scale regions with complex soil landscapes. To address these challenges, this study collected soil pH data from 4335 soil surface points (0–20 cm) obtained from the China Soil System Survey, combined with a multi-source environmental covariate. This study integrates Geographic Weighted Regression (GWR) with three ML models (Random Forest, Cubist, and XGBoost) and designs and develops three geographically weighted machine learning models optimized by Genetic Algorithms to improve the prediction of soil pH values. Compared to GWR and traditional ML models, the R<sup>2</sup> of the geographic weighted random forest (GWRF), geographic weighted Cubist (GWCubist), and geographic weighted extreme gradient boosting (GWXGBoost) models increased by 1.98% to 14.29%, while the RMSE decreased by 1.81% to 11.98%. Among the three models, the GWRF model performed the best and effectively reduced uncertainty in soil pH mapping. Mean Annual Precipitation and the Normalized Difference Vegetation Index are two key environmental variables influencing the prediction of soil pH, and they have a significant negative impact on the spatial distribution of soil pH. These findings provide a scientific basis for effective soil health management and the implementation of large-scale soil modeling programs.https://www.mdpi.com/2072-4292/17/6/1086soil pHgeographically weighted machine learninggenetic algorithmuncertaintydigital soil mapping
spellingShingle Wantao Zhang
Jingyi Ji
Binbin Li
Xiao Deng
Mingxiang Xu
Integrating Genetic Algorithm and Geographically Weighted Approaches into Machine Learning Improves Soil pH Prediction in China
Remote Sensing
soil pH
geographically weighted machine learning
genetic algorithm
uncertainty
digital soil mapping
title Integrating Genetic Algorithm and Geographically Weighted Approaches into Machine Learning Improves Soil pH Prediction in China
title_full Integrating Genetic Algorithm and Geographically Weighted Approaches into Machine Learning Improves Soil pH Prediction in China
title_fullStr Integrating Genetic Algorithm and Geographically Weighted Approaches into Machine Learning Improves Soil pH Prediction in China
title_full_unstemmed Integrating Genetic Algorithm and Geographically Weighted Approaches into Machine Learning Improves Soil pH Prediction in China
title_short Integrating Genetic Algorithm and Geographically Weighted Approaches into Machine Learning Improves Soil pH Prediction in China
title_sort integrating genetic algorithm and geographically weighted approaches into machine learning improves soil ph prediction in china
topic soil pH
geographically weighted machine learning
genetic algorithm
uncertainty
digital soil mapping
url https://www.mdpi.com/2072-4292/17/6/1086
work_keys_str_mv AT wantaozhang integratinggeneticalgorithmandgeographicallyweightedapproachesintomachinelearningimprovessoilphpredictioninchina
AT jingyiji integratinggeneticalgorithmandgeographicallyweightedapproachesintomachinelearningimprovessoilphpredictioninchina
AT binbinli integratinggeneticalgorithmandgeographicallyweightedapproachesintomachinelearningimprovessoilphpredictioninchina
AT xiaodeng integratinggeneticalgorithmandgeographicallyweightedapproachesintomachinelearningimprovessoilphpredictioninchina
AT mingxiangxu integratinggeneticalgorithmandgeographicallyweightedapproachesintomachinelearningimprovessoilphpredictioninchina