Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest China

Machine learning models are gradually replacing traditional techniques used for landslide susceptibility assessment. This study aims to comprehensively compare multiple models, including linear, nonlinear, and ensemble models, based on 5281 historical landslides in southwest China, the area most sev...

Full description

Saved in:
Bibliographic Details
Main Authors: Bingwei Wang, Qigen Lin, Tong Jiang, Huaxiang Yin, Jian Zhou, Jinhao Sun, Dongfang Wang, Ran Dai
Format: Article
Language:English
Published: Taylor & Francis Group 2023-12-01
Series:Geocarto International
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/10106049.2022.2152493
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832539795797049344
author Bingwei Wang
Qigen Lin
Tong Jiang
Huaxiang Yin
Jian Zhou
Jinhao Sun
Dongfang Wang
Ran Dai
author_facet Bingwei Wang
Qigen Lin
Tong Jiang
Huaxiang Yin
Jian Zhou
Jinhao Sun
Dongfang Wang
Ran Dai
author_sort Bingwei Wang
collection DOAJ
description Machine learning models are gradually replacing traditional techniques used for landslide susceptibility assessment. This study aims to comprehensively compare multiple models, including linear, nonlinear, and ensemble models, based on 5281 historical landslides in southwest China, the area most severely affected by the landslide disaster. Linear models represented by logistic regression (LR), nonlinear models represented by support vector machine (SVM), artificial neural network (ANN) and classification 5.0 decision tree (C5.0 DT), and ensemble models represented by random forest (RF) and categorical boosting (Catboost) were selected. The correlation coefficient, variance inflation factor (VIF), and relative important analysis were used to select the dominate landslide conditioning factors. Using multiple statistical indicators (e.g. Area Under the Receiver Operating Characteristic curve (AUC) and Kappa), cross-validation and qualitative methods to evaluate the models’ performance. The findings are: (1) Regarding the model predictive performance, the best predictive performance was demonstrated by the ensemble models Catboost (AUC = 0.823 and Kappa = 0.593) and RF (AUC = 0.821 and Kappa = 0.582), followed by the nonlinear models SVM (AUC = 0.775 and Kappa = 0.520), ANN (AUC = 0.770 and Kappa = 0.486) and C5.0 DT (AUC = 0.751 and Kappa = 0.497), while the linear model LR (AUC = 0.756 and Kappa = 0.456) had a more limited performance. The ensemble model, which uses a tree as its baseline classifier, has a lot of potential for studies into the landslide susceptibility. (2) Regarding the model robustness, the three types of models in nonspatial cross-validation (CV) performed relatively similarly in terms of predictive power, while in spatial cross-validation (SPCV), the linear model LR (median AUC = 0.714) achieved better results than the ensemble and nonlinear models. It implies that when the distribution of landslides is not homogeneous, linear models may be the most robust. It is advisable to consider various evaluation metrics from different perspectives and integrate them with specialist qualitative geomorphological empirical knowledge to determine the best model. (3) The Gini index-based RF model suggests that road density was the dominant factor in the frequency of landslides in the study area.
format Article
id doaj-art-06052c24e50546edb2ce66353cd1ce03
institution Kabale University
issn 1010-6049
1752-0762
language English
publishDate 2023-12-01
publisher Taylor & Francis Group
record_format Article
series Geocarto International
spelling doaj-art-06052c24e50546edb2ce66353cd1ce032025-02-05T08:30:30ZengTaylor & Francis GroupGeocarto International1010-60491752-07622023-12-0138110.1080/10106049.2022.2152493Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest ChinaBingwei Wang0Qigen Lin1Tong Jiang2Huaxiang Yin3Jian Zhou4Jinhao Sun5Dongfang Wang6Ran Dai7Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Institute for Disaster Risk Management/School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Institute for Disaster Risk Management/School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Institute for Disaster Risk Management/School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Institute for Disaster Risk Management/School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Institute for Disaster Risk Management/School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Institute for Disaster Risk Management/School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Institute for Disaster Risk Management/School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Institute for Disaster Risk Management/School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing, ChinaMachine learning models are gradually replacing traditional techniques used for landslide susceptibility assessment. This study aims to comprehensively compare multiple models, including linear, nonlinear, and ensemble models, based on 5281 historical landslides in southwest China, the area most severely affected by the landslide disaster. Linear models represented by logistic regression (LR), nonlinear models represented by support vector machine (SVM), artificial neural network (ANN) and classification 5.0 decision tree (C5.0 DT), and ensemble models represented by random forest (RF) and categorical boosting (Catboost) were selected. The correlation coefficient, variance inflation factor (VIF), and relative important analysis were used to select the dominate landslide conditioning factors. Using multiple statistical indicators (e.g. Area Under the Receiver Operating Characteristic curve (AUC) and Kappa), cross-validation and qualitative methods to evaluate the models’ performance. The findings are: (1) Regarding the model predictive performance, the best predictive performance was demonstrated by the ensemble models Catboost (AUC = 0.823 and Kappa = 0.593) and RF (AUC = 0.821 and Kappa = 0.582), followed by the nonlinear models SVM (AUC = 0.775 and Kappa = 0.520), ANN (AUC = 0.770 and Kappa = 0.486) and C5.0 DT (AUC = 0.751 and Kappa = 0.497), while the linear model LR (AUC = 0.756 and Kappa = 0.456) had a more limited performance. The ensemble model, which uses a tree as its baseline classifier, has a lot of potential for studies into the landslide susceptibility. (2) Regarding the model robustness, the three types of models in nonspatial cross-validation (CV) performed relatively similarly in terms of predictive power, while in spatial cross-validation (SPCV), the linear model LR (median AUC = 0.714) achieved better results than the ensemble and nonlinear models. It implies that when the distribution of landslides is not homogeneous, linear models may be the most robust. It is advisable to consider various evaluation metrics from different perspectives and integrate them with specialist qualitative geomorphological empirical knowledge to determine the best model. (3) The Gini index-based RF model suggests that road density was the dominant factor in the frequency of landslides in the study area.https://www.tandfonline.com/doi/10.1080/10106049.2022.2152493Evaluation of machine learning modelscross-validationlandslide susceptibility assessmentsouthwest China
spellingShingle Bingwei Wang
Qigen Lin
Tong Jiang
Huaxiang Yin
Jian Zhou
Jinhao Sun
Dongfang Wang
Ran Dai
Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest China
Geocarto International
Evaluation of machine learning models
cross-validation
landslide susceptibility assessment
southwest China
title Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest China
title_full Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest China
title_fullStr Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest China
title_full_unstemmed Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest China
title_short Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest China
title_sort evaluation of linear nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest china
topic Evaluation of machine learning models
cross-validation
landslide susceptibility assessment
southwest China
url https://www.tandfonline.com/doi/10.1080/10106049.2022.2152493
work_keys_str_mv AT bingweiwang evaluationoflinearnonlinearandensemblemachinelearningmodelsforlandslidesusceptibilityassessmentinsouthwestchina
AT qigenlin evaluationoflinearnonlinearandensemblemachinelearningmodelsforlandslidesusceptibilityassessmentinsouthwestchina
AT tongjiang evaluationoflinearnonlinearandensemblemachinelearningmodelsforlandslidesusceptibilityassessmentinsouthwestchina
AT huaxiangyin evaluationoflinearnonlinearandensemblemachinelearningmodelsforlandslidesusceptibilityassessmentinsouthwestchina
AT jianzhou evaluationoflinearnonlinearandensemblemachinelearningmodelsforlandslidesusceptibilityassessmentinsouthwestchina
AT jinhaosun evaluationoflinearnonlinearandensemblemachinelearningmodelsforlandslidesusceptibilityassessmentinsouthwestchina
AT dongfangwang evaluationoflinearnonlinearandensemblemachinelearningmodelsforlandslidesusceptibilityassessmentinsouthwestchina
AT randai evaluationoflinearnonlinearandensemblemachinelearningmodelsforlandslidesusceptibilityassessmentinsouthwestchina