Loading [MathJax]/extensions/TeX/boldsymbol.js
MENU

We use cookies to improve your experience with our site.

2023 lmpact Factor: 4.6 (Q1)
2023 CiteScore:7.8 (Q1)
Articles in press have been peer-reviewed and accepted, which are not yet edited and assigned to volumes/issues, but are citable by Digital Object Identifier (DOI).
Zhang Ao, Zhao Xin-wen, Zhao Xing-yuezi, Zheng Xiao-zhan, Zeng Min, Huang Xuan, Wu Pan, Jiang Tuo, Wang Shi-chang, He Jun, Li Yi-yong. 2024. Comparative study of different machine learning models in landslide susceptibility assessment: A case study of Conghua District, Guangzhou, China. China Geology, 7(1), 104‒115. doi: 10.31035/cg2023056.
Citation: Zhang Ao, Zhao Xin-wen, Zhao Xing-yuezi, Zheng Xiao-zhan, Zeng Min, Huang Xuan, Wu Pan, Jiang Tuo, Wang Shi-chang, He Jun, Li Yi-yong. 2024. Comparative study of different machine learning models in landslide susceptibility assessment: A case study of Conghua District, Guangzhou, China. China Geology, 7(1), 104‒115. doi: 10.31035/cg2023056.

Comparative study of different machine learning models in landslide susceptibility assessment: A case study of Conghua District, Guangzhou, China

  • Machine learning is currently one of the research hotspots in the field of landslide prediction. To clarify and evaluate the differences in characteristics and prediction effects of different machine learning models, Conghua District, which is the most prone to landslide disasters in Guangzhou, was selected for landslide susceptibility evaluation. The evaluation factors were selected by using correlation analysis and variance expansion factor method. Applying four machine learning methods namely Logistic Regression (LR), Random Forest (RF), Support Vector Machines (SVM), and Extreme Gradient Boosting (XGB), landslide models were constructed. Comparative analysis and evaluation of the model were conducted through statistical indices and receiver operating characteristic (ROC) curves. The results showed that LR, RF, SVM, and XGB models have good predictive performance for landslide susceptibility, with the area under curve (AUC) values of 0.752, 0.965, 0.996, and 0.998, respectively. XGB model had the highest predictive ability, followed by RF model, SVM model, and LR model. The frequency ratio (FR) accuracy of LR, RF, SVM, and XGB models was 0.775, 0.842, 0.759, and 0.822, respectively. RF and XGB models were superior to LR and SVM models, indicating that the integrated algorithm has better predictive ability than a single classification algorithm in regional landslide classification problems.
  • Landslide is one of the more frequent geological disasters in China, often causing serious damage to natural resources, the ecological environment, and infrastructure, posing a serious threat to the safety of people’s lives and property (Guzzetti F et al., 2012). Against the backdrop of global climate change, the number of extreme weather events in China has increased. The originally fragile geological environment has exacerbated the risk of landslides, and landslide prevention and mitigation work has become an urgent task in current society. Landslide susceptibility evaluation is an important basis for landslide disaster prevention and mitigation. Landslide susceptibility refers to the possibility of landslide occurrence under certain geological and environmental conditions in a certain area, with a focus on evaluating the probability of the location and spatial aspect of landslide occurrence (Reichenbach P et al., 2018).

    In the process of landslide susceptibility assessment, the most critical link is to establish an appropriate assessment model. An excellent vulnerability assessment model can fully tap the mapping relationship between landslide and its basic environmental factors, and build a nonlinear function from basic environmental factors to landslide spatial probability (Huang FM et al., 2022; Dou J et al., 2019; Tsangaratos P et al., 2017; Jia YF et al., 2023; Xiong XH et al., 2022). The commonly used landslide susceptibility evaluation models at home and abroad are divided into non-deterministic models and deterministic models (Xia H et al., 2018). In the development process of landslide susceptibility evaluation models, traditional non-deterministic models based on statistical analysis have been widely applied in previous studies, such as information quantity models (Liu HH, 2012; Wang T et al., 2021), evidence weight models (Li JL et al., 2016), analytic hierarchy processes (Yang DH et al., 2015), fuzzy comprehensive evaluation methods (Chen W et al., 2021), etc.

    With the development of data mining and artificial intelligence, domestic and foreign scholars have gradually applied algorithm models in the field of machine learning to landslide susceptibility evaluation, such as Logical Regression (LR) (Zhang J et al., 2016), Decision Tree (DT) (Paraskevas T et al., 2016), Support Vector Machine (SVM) (Chen W et al., 2016), Artificial Neural Network (ANN) (Dou J et al., 2015), etc. Lee S et al. (2001) used LR method and probability method to generate landslide susceptibility zoning map, and the evaluation results were consistent with the landslide survey data. Feng HJ et al. (2016) compared the application of LR, Information Quantity (IQ), and ANN in landslide susceptibility evaluation in Chun’an County, Zhejiang Province, and concluded that the ANN model was superior to the other two models. Zêzere JL et al. (2017) and Huang Y et al. (2018) carried out a comparative analysis of machine learning models to predict landslide susceptibility, and believed that the SVM model had the advantages of relatively stable prediction results and high recognition. Based on the traditional weighting system, Kanungo DP et al. (2006) used ANN model, fuzzy weighting model and a hybrid model of neural network and fuzzy weighting to evaluate landslide susceptibility. The results showed that the hybrid model of neural networks and fuzzy weighting was more accurate.

    To further verify the generalization performance of machine learning models in landslide susceptibility evaluation, and promote the application of these machine learning models in the landslide field, the typical machine learning medels namely Logistic Regression (LR), Random Forest (RF), Support Vector Machines (SVM), and Extreme Gradient Boosting (XGB) are selected to evaluate the landslide susceptibility. Statistical indexs and receiver operating characteristic (ROC) curve were used to verify the prediction effect of each model.

    Conghua District of Guangzhou is located in the middle of Guangdong Province, 113°17′‒114°04′E, 23°22′‒23°56′N, with a total area of 1974.5 km2, which is the highest altitude district in Guangzhou. This study area is located in the transition zone from the Pearl River Delta to the mountainous area of northern Guangdong, with the terrain inclined from north to south. The highest point is Liangkou Tiantangding, with an altitude of 1210 m, and the lowest point is Taiping Village, Taiping Town, with an altitude of 16 m (Fig. 1). The landform is dominated by low mountains and hills, with steep terrain and developed valleys. The study area has a humid monsoon climate in the north subtropical zone with a mild climate and abundant rainfall. The annual average temperature is 21‒22°C and the annual rainfall is 1907 mm.

    Figure  1.  Location of the study area (a) and geographical location of the study area and historical landslide distribution map (b).

    The main exposed strata in the area are Proterozoic gneiss complex, Devonian siltstone, Carboniferous limestone dolomite, Triassic quartz sandstone, Jurassic– Cretaceous granite, granodiorite, Paleogene sandstone, glutenite, and Quaternary clay. The complex and changeable geological and geomorphic conditions provide favorable conditions for the breeding of landslides. According to statistics, 1231 landslides occurred in Guangzhou from 2013 to 2020. There were 365 landslides in the Conghua District, accounting for 29.7%, which was one of the areas with the largest number of landslides. Landslides were mainly distributed in middle and low mountains, hills, and river banks with strong topographic cutting in the area.

    In the study of landslide susceptibility, the relationship between historical landslide events, geological environmental factors, and location space must be considered (Pham BT et al., 2015, 2016). Therefore, the initial data in this study include: (1) The detailed survey data of geological disasters in the Conghua District, which were used to obtain the location distribution of historical landslide points. The data were derived from the “Guangzhou multi-factor urban geological survey” and “Guangzhou 1∶50000 geological disaster detailed survey” project; (2) the digital elevation model (DEM) with a spatial resolution of 30 m, which was derived from the free and public basic data of the Geographic Data Cloud (https://www.gscloud.cn) and used to analyze and extract the basic factors of topography and geomorphology, such as elevation, slope, and aspect; (3) Landsat-8 remote sensing image with a spatial resolution of 30 m, which can also be obtained from the Geographic Data Cloud and was used to obtain normalized difference vegetation index (NDVI); (4) meteorological station observation data, which were derived from National Earth System Science Data Center, National Science and Technology Infrastructure of China (http://www.geodata.cn) and were used to obtain the annual average rainfall factor. According to the size of the study area and the scale of landslides, the grid with a resolution of 30 m×30 m was used as the basic unit for landslide susceptibility assessment. The spatial analysis of all initial data was carried out to extract the landslide susceptibility evaluation factors. After rasterization, the spatial resolution of all factors was unified to 30 m.

    In this landslide susceptibility assessment, the prediction performance of different machine learning landslide models is compared and analyzed to draw a more accurate landslide susceptibility zoning map in the study area. The research process is mainly divided into six steps (Fig. 2). (1) Data acquisition: The geospatial database of basic environmental factors is established based on the landslide frequency ratio. (2) Feature extraction: Feature selection method is used to screen appropriate susceptibility evaluation feature indicators. (3) Data set division and Feature Engineering: The training set and test set are divided and standardized. (4) Machine learning: Using training sets to build landslide models with different algorithms. (5) Model evaluation: Use the test set to verify and compare the accuracy of different landslide models. (6) Drawing landslide susceptibility map.

    Figure  2.  Flowchart of the machine learning model for landslide susceptibility evaluation.

    Previous studies have shown that locations with similar geological and topographical conditions as historical landslide points are more prone to landslide (Miao WD et al., 2003), and the frequency ratio (FR) method can better reflect the nonlinear relationship between the basic environment and landslide susceptibility, which has been widely used in the field of landslide (Chen W et al., 2021; Hu T et al., 2020). Therefore, in this paper, the FR method based on the principle of statistics was used as a quantitative model to calculate the ratio of the frequency of known landslide units in each partition to the frequency of all grid units in the corresponding partition in the study area, so as to quantify the contribution of each factor interval grade to landslide susceptibility (Equation 1).

    FR=ni/Nsi/S (1)

    Where ni is the number of landslide grids in the i-th grading interval for each type of factor; N is the total number of landslide grids in the area; si is the grid number of each type of factor in the i-th hierarchical interval; S is the total number of grids in the study area; FR is the frequency ratio of these factors.

    Landslide susceptibility assessment is to determine whether a unit is a landslide one. There are only two answers: “yes” and “no”, so it is a binary problem. As a typical classification model in machine learning, LR has the advantages of simple and efficient algorithm, and is a powerful tool to solve the problem of binary classification. Based on the core mathematical concept of natural logarithm (Peng CYJ et al., 2002), LR was introduced and widely used in the late 1960s and early 1970s (Cabrera AF, 1994), which is very suitable for describing and verifying the nonlinear relationship between the classification result variable (landslide or non landslide) and the classification prediction variable (landslide impact factor). The natural logarithm of LR is expressed as Equation 2:

    Y=f(P)=ln(P1P)=β0+β1X1+β2X2++βnXn (2)

    Therefore, the probability of landslide event (P) can be determined by Equation 3:

    P=P(Y|X)=eβ0+β1X1+β2X2++βnXn1+eβ0+β1X1+β2X2++βnXn (3)

    Where Y is the result variable (landslide or non landslide); X=X1,X2,Xn represent environmental factors affecting landslide; n is the number of environmental factors; β0 is the intercept; β1,β2,,βn is the regression coefficient.

    RF is a typical representative of ensemble learning. Combining Breiman’s idea and Ho’s description method, multiple decision trees are constructed through different data subsets, and then the judgment results of multiple decision trees are voted to obtain the final output of a random forest. Compared with the traditional landslide evaluation method, the new method of random sampling of samples and characteristics is introduced, which can improve the accuracy and stability of the model more than a single decision tree. A large number of studies have shown that random forest has a high fault tolerance rate in terms of algorithm, outliers, and noise.

    RF adopts a bagging algorithm. Samples are randomly selected to construct a subset of data and have a sample to be put back. m feature attributes are randomly selected from all feature attributes M to build a weak decision tree (m<M). The machine training n weak decision trees y1(X),y2(X),yi(X) are obtained by repeating n times, and the random forest model is established. Its expression is (Equation 4):

    Y(x)=argmaxZni=1I(yi(X)=Z) (4)

    Where Y(x) is the RF model; yi(X) is the single weak decision tree model; Z is the output variable; I is the explicit function; n is the number of weak decision trees.

    SVM is a machine learning method based on structural risk minimization principle. Through non-linear mapping, the non-linear separable data is mapped to the high-dimensional feature space, and the optimal classification hyperplane is found in this feature space to realize the efficient classification of positive and negative data. It can keep the interval to the maximum, which makes the support vector machine have better robustness. The hyperplane calculation formula is (Equation 5):

    wTx+b=0 (5)

    Where w is the normal vector; x is the feature vector of sample points; b is a constant. When w and b are optimal, it means that the optimal hyperplane is found so that the distance between positive and negative samples is the largest. By solving the above optimization problem and introducing relaxation variables and penalty factors in the calculation process, the optimal hyperplane can be determined as (Equations 6, 7):

    min12w2+Cni=1ξi=0 (6)
    \mathrm{s}.\mathrm{t}.{y}_{i}\left({{w}}^{\mathrm{T}}{{x}}_{\boldsymbol{i}}+b\right)\geqslant 1-{\xi }_{i} (7)

    Lagrange multipliers are introduced to transform the dual problem, and the optimal classification decision function is obtained (Equation 8).

    f\left(x\right)=sign\left[\sum\nolimits_{i=1}^{n}{\alpha }_{i}{y}_{i}\kappa \left({{x}}_{i},{{x}}_{j}\right)+b\right] (8)

    Where \kappa \left({\mathit{x}}_{i},{\mathit{x}}_{j}\right)=\varphi \left({\mathit{x}}_{i}\right)\cdot \varphi \left({\mathit{x}}_{j}\right) is the kernel function, representing the mapping from the input space to the feature space. Linear kernel (LN), polynomial kernel (PL), radial basis function (RBF) and sigmoid kernel (SIG) are commonly used in SVM.

    XGB, as one of the newly proposed algorithms, is the trump card of integrated learning. At present, it is rarely used in landslide susceptibility evaluation. Compared with the traditional gradient lifting tree, XGB is faster than other integration algorithms using gradient lifting and has been considered an advanced evaluator with ultra-high performance in classification and regression. XGB algorithm changes to a greedy strategy for the continuous iteration of weight distribution in the classification of wrong samples. The best direction of training is the direction of loss function gradient decline. Taylor’s second-order expansion is used to optimize the loss function. At the same time, a regularization term is added to control the complexity of the model to prevent overfitting.

    XGB builds the optimal model by minimizing the loss function. The loss function is a regular term that increases the complexity of the model. The objective function is set to optimize the iterative model using the idea of minimizing structural risk. The objective function of the model is expressed as Equation 9:

    {obj}^{\left(t\right)}=\sum\nolimits_{i}^{n}L\left({y}_{i},{\widehat{y}}_{i}^{\left(t\right)}\right)+\sum\nolimits_{k=1}^{t}\mathrm{\Omega }\left({f}_{k}\right) (9)

    Where {\widehat{y}}_{i} is the model prediction value of the i-th sample in the t-round; {y}_{i} is the real value; L\left({y}_{i},{\widehat{y}}_{i}^{\left(t\right)}\right) is the prediction error of the t-round; n is the total number of samples; \mathrm{\Omega }\left({f}_{k}\right) is the regularization term of the k-round, representing the complexity of the k-round model, defined as Equation 10:

    \mathrm{\Omega }\left({f}_{k}\right)=\gamma T+\frac{1}{2}\lambda {\|w\|}^{2} (10)

    Where T is the number of leaf nodes; \|w\| is the modulus of leaf node vector; \gamma is the difficulty of node segmentation; \lambda is the regularization coefficient.

    The objective function is optimized by Taylor second-order expansion. Ordering {g}_{i}={\partial }_{{\widehat{y}}^{\left(t-1\right)}}l\left({y}_{i},{\widehat{y}}_{i}^{\left(t-1\right)}\right) , {h}_{i}={\partial }_{{\widehat{y}}^{\left(t-1\right)}}^{2} l\left({y}_{i},{\widehat{y}}_{i}^{\left(t-1\right)}\right) , bring Equation 10 into Equation 9, Equation 9 is expressed as Equation 11:

    {obj}^{\left(t\right)}=\sum\nolimits_{j=1}^{T}\left[\left(\sum _{i\in {I}_{j}}{g}_{i}\right){w}_{j}+\frac{1}{2}\left(\sum _{i\in {I}_{j}}{h}_{i}+\lambda \right){w}_{j}^{2}\right]+\gamma T (11)

    Where {I}_{j}=\left\{i|q\left({x}_{i}\right)=j\right\} , is the sample set of the jth leaf node; q\left({x}_{i}\right) is the structure of the tree. Ordering \sum _{i\in {I}_{j}}{g}_{i}={G}_{j} , \sum _{i\in {I}_{j}}{h}_{i}={H}_{j} , {G}_{j} and {H}_{j} are deterministic quantities, and Equation 11 can be regarded as a quadratic function of one variable with respect to leaf node w. Minimizing Equation 11, the optimal parameters and the optimal loss function are obtained (Equations 12, 13):

    {w}_{j}^{*}=-\frac{{G}_{j}}{{H}_{j}+\lambda } (12)
    {obj}^{*}=-\frac{1}{2}\sum\nolimits_{j=1}^{T}\frac{{G}_{j}^{2}}{{H}_{j}+\lambda }+\gamma T (13)

    To verify the performance and generalization ability of the model, it is necessary to evaluate the fitness of the model in different data sets (Tien Bui D et al., 2016). The evaluation results in the training data set reflect the fitting degree of the landslide model and training data. The evaluation results in the test data set reflect the prediction ability of the landslide model (Tien Bui D et al., 2012). In this paper, the performance of four landslide models is evaluated and compared by using the evaluation based on statistical indicators and the ROC curve.

    Many indexes in statistics can be used to verify the performance of the model (Bennett ND et al., 2013). In this paper, positive predictive value, negative predictive value, sensitivity, specificity, accuracy, root mean square error, and other statistical indicators are used to verify the performance of the landslide model (Table 1).

    Table  1.  Description of statistical index.
    No.NameFormulaDescription
    1Sensitivity (SST) SST=\displaystyle\frac{TP}{TP+FN} SST represents the percentage of landslide grid correctly classified as “landslide” predicted
    2Specificity(SPF) SPF=\displaystyle\frac{TN}{FP+TN} SPF represents the percentage of non-landslide grids correctly classified as “non-landslide”
    3Accuracy(ACC) ACC=\displaystyle\frac{TP+TN}{TP+TN+FP+FN} ACC represents the proportion of “landslide” and “non-landslide” correctly classified in the total grid
    4Root mean squared error (RMSE) \displaystyle RMSE={\left[\left(1/m\right)\displaystyle\sum\nolimits_{i=1}^{m}{\left({e}_{i}-\overline{{e}_{i}}\right)}^{2}\right]}^{0.5} RMSE shows the error metric in the same units with the original data
    Smaller RMSE value indicates better performance of landslide model
     | Show Table
    DownLoad: CSV

    Where TP (true positive) is the number of landslide points correctly classified as landslide; TN (true negative) is the total number of non-landslide points correctly classified as non-landslide; FN (false negative) is the number of landslides classified as non-landslide; FP (false positive) is the number of non-landslide points classified as a landslide. {e}_{i} is the estimated value of the i-th observation, \overline{{e}_{i}} is the measured value of the i-th observation.

    The receiver operating characteristic (ROC) curve is widely used in model comparison and evaluation. ROC curve is generated by counting the sensitivity (landslide samples predicted as a landslide) and 1-specificity (non-landslide samples predicted as a landslide) of each model. The area under curve (AUC) value of the ROC curve is between 0 and 1. The larger the value, the higher the prediction accuracy. When AUC is equal to 1, the model is a perfect classifier.

    In the study of landslide susceptibility evaluation, the selection of appropriate evaluation factors directly affects the reliability and accuracy of susceptibility results. Selection is generally based on the objective existence, significance, and inheritance of evaluation factors (Reichenbach P et al., 2018; Huang FM et al., 2020). According to the formation conditions of landslide disaster in the study area, landslide mainly depends on the combined action of internal and external factors of geological environment conditions. In this study, a total of seven indicators including elevation, gradient, slope aspect, curvature, annual rainfall, NDVI, and stratum lithologic were selected as the preliminary basic environmental factors. These evaluation factors have been applied in many studies as the main factors affecting the formation of landslides, and can easily obtain high-precision data through investigation and collection to ensure the accuracy of landslide susceptibility evaluation. The natural discontinuity method was used to divide the selected basic environmental factors into five attribute intervals. The division results are shown in Fig. 3. The FR of each environmental factor attribute interval was calculated according to Equation 1, and the calculation results are shown in Table 2. The FR was used as the quantitative value of each factor for machine learning. The size of the FR determines whether the attribute interval of environmental factors is conducive to landslide development.

    Figure  3.  Basic environmental factors of landslide susceptibility assessment.
    Table  2.  Attribute interval and FR of each evaluation factor.
    FactorsAttribute intervalNumber of interval landslide gridsInterval grid numberFR
    Elevation/m8–1391209141640.795
    139–2941145292501.304
    294–453794347051.100
    453–681322308550.839
    681–1167201003231.207
    Gradient/°0–4.991147656790.901
    4.99–11.171285858081.323
    11.17–18.06794540461.053
    18.06–26.86342795360.736
    26.86–60.12101242280.487
    Slope aspectPlane155441.092
    North602972531.222
    Northeast302158020.841
    East482522041.152
    Southeast492773071.070
    South593333081.071
    Southwest522513461.252
    West312814880.667
    Northwest352950450.718
    Curvature−10.7–−1.325726330.417
    −1.32–−0.36573048451.132
    −0.36–0.3522212301771.092
    0.35–1.31664840440.825
    1.31–9.63151175980.772
    Annual rainfall/mm1949–1978877926430.664
    1978–20001015486941.114
    2000–2024885037421.057
    2024–2058542320381.409
    2058–2112351321801.603
    NDVI775–3655301078381.684
    3655–5352661270023.146
    5352–6640672553801.588
    6640–74541116999360.960
    7454–87649110191410.540
    Stratum lithologyGranite25214905241.023
    Gneiss complex3588430.309
    Sandstone and mudstone582470371.421
    Red sandy conglomerates033380.000
    limestone and dolomite153791.125
    Pleistocene clay055060.000
    Holocene clay513986700.774
     | Show Table
    DownLoad: CSV

    The basic environmental factors selected are related to the occurrence of landslides to a certain extent, but the possible correlation between the factors will adversely affect the prediction results and may increase the complexity and running time of the model. Therefore, before the evaluation, it is necessary to select the features of the quantified factors and carry out correlation analysis to eliminate the factors with high correlation to improve the model efficiency and prediction accuracy. The correlation analysis results (Fig. 4) show that except for each factor's autocorrelation coefficient of 1, the correlation coefficients of the other factors are small, which are −0.22–0.23, indicating that each basic environmental factor can be independently used as the characteristic variable of landslide susceptibility assessment.

    Figure  4.  Correlation analysis of evaluation factors of heat map.

    Because the collinearity between sample data easily leads to the decline of model training accuracy, the data set used to build the model needs to meet certain collinearity requirements. Variance in inflation factor (VIF) is used to characterize the degree of collinearity between factors. When VIF is greater than 10, it indicates that there is serious collinearity between data and it needs to be eliminated. The analysis results are shown in Table 3. The VIF value of each basic environmental factor does not exceed 10, indicating that the sample data is not significantly collinear, and all basic environmental factors can be used as evaluation factors to evaluate landslide susceptibility without elimination.

    Table  3.  Collinearity analysis of basic environmental factors.
    No.FactorsVIF
    1Gradient1.066933
    2Elevation1.336441
    3Curvature1.072272
    4Annual rainfall1.144736
    5NDVI1.255478
    6Stratum lithologic1.113084
    7Slope aspect1.031587
     | Show Table
    DownLoad: CSV

    The natural breakpoint method and FR method were used to quantify the selected seven types of evaluation factors, and the geospatial database of landslide susceptibility evaluation in the study area was obtained. To build the landslide prediction model, the database needs to be extracted and segmented, and the training set and test set are generated respectively. The training data set was used to train the landslide model, and the test data set was used to verify the performance of the landslide model (Pham BT et al., 2016).

    All 365 landslide points in the study area were taken as landslide sample points, and 1000 non-landslide sample points were randomly extracted outside the landslide area. The landslide sample points and non-landslide sample points were combined to form a sample data set. 70% (955 points) were randomly selected as the training set, and the remaining 30% (410 points) were selected as the test set. 30 m×30 m grid as the basic evaluation unit, the study area is divided into 2209297 units. LR, RF, SVM, and XGB landslide models were constructed using the training set, and the performance of the four landslide models on the training set and the test set was analyzed respectively (Tables 4, 5). The results indicate that all landslide models show high prediction ability. In the training set, XGB model (ACC=0.99) has the highest ACC value, followed by RF model (ACC=0.94), SVM model (ACC=0.82), and LR model (ACC=0.80); In the test set, XGB model (ACC=0.99) has the highest ACC value, followed by RF model (ACC=0.97), SVM model (ACC=0.83) and LR model (ACC=0.83). Therefore, XGB model has the highest prediction ability, followed by RF, SVM, and LR models.

    Table  4.  Model performance using training dataset.
    No.ParametersLRSVMRFXGB
    1TN693700700695
    2FP7005
    3FN181169544
    4TP7486201251
    5SST/%0.290.340.790.98
    6SPF/%0.991.001.000.99
    7ACC0.800.820.940.99
    8RMSE0.440.420.240.10
     | Show Table
    DownLoad: CSV
    Table  5.  Model predictive capability using testing dataset.
    No.ParametersLRSVMRFXGB
    1TN294300300296
    2FP6004
    3FN6769141
    4TP634196109
    5SST/%0.480.370.870.99
    6SPF/%0.981.001.000.99
    7ACC0.830.830.970.99
    8RMSE0.420.410.180.11
     | Show Table
    DownLoad: CSV

    In addition, the ROC curve using the training set is shown in Fig. 5. The AUC values of SVM, RF, and XGB models are relatively close, which are 0.999, 0.997, and 0.991 respectively. The AUC value of LR model is the lowest (AUC=0.722); The ROC curve using the test set is shown in Fig. 6. The AUC values of RF and XGB models are relatively close, which are 0.998 and 0.996 respectively. The AUC value of SVM model is slightly smaller (AUC=0.965), and the AUC value of LR model is the lowest (AUC=0.752). The results of ROC curve analysis show that the prediction ability of LR model is the worst among the four models, the prediction ability of SVM, RF, and XGB models is similar, and the prediction ability of XGB model is slightly stronger.

    Figure  5.  Analysis of the ROC curve of different landslide models using training dataset.
    Figure  6.  Analysis of the ROC curve of different landslide models using testing dataset.

    Four landslide models trained and tested were used to calculate the landslide susceptibility index from all grid data in the study area. The natural breakpoint method was used to classify the landslide susceptibility index into five grades: Very low, low, moderate, high, and very high. The landslide susceptibility map (Fig. 7) predicted by each model was compiled. The results show that the prediction results of all models are similar in regional spatial distribution, reflecting the characteristics that the distribution law of landslide susceptibility is consistent with the basic geographical environment. The very low and low susceptible areas of landslide in the Conghua district are concentrated in the alluvial pluvial plain in the southeast, while the very high and high susceptible areas are mainly concentrated in the valleys in the northeast and low mountain areas in the East, extending radially to the northwest, north, northeast, east and other directions. The landslide is mainly affected by the steep terrain in the northeast of Conghua District, the thick and loose residual overburden in the valley, rainfall scouring, slope cutting, and other human engineering activities. The historical landslide points are mostly located in the middle and high prone areas (Fig. 8). The predicted results are relatively consistent with the distribution of historical landslide disaster points.

    Figure  7.  Landslide susceptibility maps using different landslide models.
    Figure  8.  Typical landslides verification.

    Fig. 9 shows the percentage of grid number (or area) in the very low, low, medium, high, and very high regions of all models. Statistics shows that the susceptibility of each model presents a similar distribution law. The susceptibility of the study area is mainly moderate, followed by high, low, and very low levels, and the lowest high level. LR, RF, SVM, and XGB models account for 12.71%, 8.48%, 12.89%, and 8.43%, respectively.

    Figure  9.  Different classes distribution of grid ratio on slope map.

    The actual prediction accuracy of each model was compared and analyzed by using the FR method, and the FR of each landslide model was calculated according to Equation 1 (Table 6). The FR accuracy of landslide susceptibility results can be obtained by dividing the FR of very high and high-susceptible areas by the sum of all FR (Huang FM et al., 2022). The FR of LR model prediction results from very low to very high prone areas is 0.227, 0.405, 0.650, 1.405, 3.018, and its FR accuracy is 0.775. The FR of the prediction results of RF model from very low to very high prone areas is 0.173, 0.344, 0.550, 1.431, 4.263, and the accuracy of FR is 0.842. The FR of the prediction results of SVM model from very low to very high prone areas is 0.315, 0.505, 0.602, 1.161, 3.316, and its FR accuracy is 0.759. The FR of the prediction results of XGB model from very low to very high prone areas is 0.185, 0.355, 0.720, 1.562, 4.256, and its FR accuracy is 0.822. The comparison results show that RF and XGB models are superior to LR and SVM models in terms of FR accuracy. RF and XGB models are typical representatives of the integrated algorithm. It can be seen that the integrated algorithm has more advantages in classification, and the predicted landslide susceptibility can better reflect the spatial distribution law and environmental aggregation characteristics of landslides.

    Table  6.  FR precision analysis of susceptibility maps of different models.
    ModelClassNumber of landslidesLandslide ratio/%Number of gridsGrid ratio/%FR
    LRVery low113.0129326913.270.227
    Low369.8653775024.340.405
    Moderate6618.0861504727.840.650
    High11230.6848244321.841.405
    Very high14038.3628078812.713.018
    RFVery low51.371749277.920.173
    Low308.2252835923.920.344
    Moderate7119.4578137635.370.550
    High12734.7953719524.321.431
    Very high13236.161874408.484.263
    SVMVery low205.4838446717.400.315
    Low4412.0552757123.880.505
    Moderate5314.5253283024.120.602
    High9225.2147965721.711.161
    Very high15642.7428477212.893.316
    XGBVery low143.8445879620.770.185
    Low308.2251117223.140.355
    Moderate7019.1858813226.620.720
    High12032.8846489021.041.562
    Very high13135.891863078.434.256
     | Show Table
    DownLoad: CSV

    Taking the Conghua District of Guangzhou as the study area, the correlation analysis and variance expansion factor method were used to select the basic environmental factors such as elevation, gradient, slope aspect, curvature, annual rainfall, NDVI, stratum lithologic, and the landslide susceptibility in the study area was predicted and evaluated based on LR, RF, SVM, XGB models.

    (i) Statistical index analysis shows that the four models have good prediction performance of landslide susceptibility. In the test set, the ACC values of LR, RF, SVM, and XGB models are 0.83, 0.97, 0.83, and 0.99 respectively, and the AUC values of ROC curves are 0.752, 0.965, 0.996, and 0.998 respectively. XGB model has the highest prediction ability, followed by RF, SVM, and LR models.

    (ii) Based on the four landslide models, landslide susceptibility maps were compiled, showing similar spatial distribution characteristics. The susceptibility of the Conghua district is mainly moderate, followed by high, low, and very low grades, and the very high grade is the least. The proportions of very high susceptible areas predicted by LR, RF, SVM, and XGB models are 12.71%, 8.48%, 12.89%, and 8.43% respectively. The very low and low susceptible areas in the Conghua district are concentrated in the alluvial pluvial plain in the southeast, while the extremely high and high susceptible areas are distributed in the valleys in the northeast and low mountain areas in the east, and the predicted results areconsistent with the distribution of historical landslide disaster points.

    (iii) Combining with the FR method, the prediction accuracy of the four landslide models in the whole study area was compared and analyzed. The FR accuracy of LR, RF, SVM, and XGB models are 0.775, 0.842, 0.759, and 0.822 respectively. RF and XGB models are superior to LR and SVM models, indicating that the integrated algorithm has better prediction ability in the regional landslide classification problem. The integrated algorithm based on the combination of multiple weak models is superior to the single classification algorithm in solving the single prediction problem. It can better reflect the spatial distribution law and environmental accumulation characteristics of landslides.

    Ao Zhang, Xing-yuezi Zhao, Xin-wen Zhao and Xiao-zhan Zheng conceived of the presented idea. Ao Zhang, Xin-wen Zhao and Xing-yuezi Zhao carried out the experiment. All authors discussed the results and contributed to the final manuscript.

    The authors declare no conflicts of interest.

    This research was supported by the projects of the China Geological Survey (DD20221729, DD20190291) and Zhuhai Urban Geological Survey (including informatization) (MZCD–2201–008). The authors are indebted to Guangzhou Municipal Bureau of Planning and Resources, Guangzhou Institute of Geological Survey, Guangzhou Urban Planning Survey and Design Institute for their assistance. The authors are also thankful to the reviewers and editors for their valuable comments and suggestions.

  • Bennett ND, Croke BF, Guariso G, Guillaume JH, Hamilton SH, Jakeman AJ, Marsili-Libelli S, Newham LT, Norton JP, Perrin C, Pierce SA, Robson B, Seppelt R, Voinov AA, Fath BD, Andreassian V. 2013. Characterising performance of environmental models. Environmental modelling & software, 40, 1–20. doi: 10.1016/j.envsoft.2012.09.011
    Cabrera AF. 1994. Logistic regression analysis in higher education: An applied perspective. In: Higher Education: Handbook of Theory and Research, 10, 225–256.
    Chen W, Chai HC, Zhao Z, Wang Q, Hong H. 2016. Landslide susceptibility mapping based on GIS andsupport vector machine models for the Qianyang County, China. Environmental Earth Sciences, 75(6), 1–13. doi: 10.1007/s12665-015-5093-0
    Chen W, Chen X, Peng J B, Panahi M, Lee S. 2021. Landslide susceptibility modeling based on ANFIS with teaching-learning-based optimization and satin bowerbird optimizer. Geoscience Frontiers, 12(1), 93–107. doi: 10.1016/j.gsf.2020.07.012
    Dou J, Yamagishi H, Pourghasemi HR, Yunus AP, Song X, Xu Y, Zhu Z. 2015. An integrated artificial neural network model for the landslide susceptibility assessment of Osado Island, Japan. Natural Hazards, 78(3), 1749–1776. doi: 10.1007/s11069-015-1799-2
    Dou J, Yunus A P, Bui D T, Merghadi A, Sahana M, Zhu ZF, Chen, CW, Khosravi K, Yang Y, Pham BT. 2019. Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Is-land, Japan. Science of the Total Environment, 662, 332–346. doi: 10.1016/j.scitotenv.2019.01.221
    Feng HJ, Zhou AG, Yu JY, Tang XM, Zheng JL, Chen XX, You SY. 2016. A comparative study on Plum-Rain-Triggered landslide susceptibility assessment models in West Zhejiang Province. Earth Science, 41(3), 403–415 (in Chinese with English abstract). doi: 10.3799/dgkx.2016.032
    Guzzetti F, Mondini AC, Cardinali M, Fiorucci F, Santangelo M, Chang KT. 2012. Landslide inventory maps: New tools for an old problem. Earth-Science Reviews, 112(1–2), 42–66. doi: 10.1016/j.earscirev.2012.02.001
    Hu T, Fan X, Wang S, Guo ZZ, Liu AC, Huang FM. 2020. Landslide susceptibility evaluation of Sinan County using logistics regression model and 3S technology. Bulletin of Geological Science and Technology, 39(2), 113–121 (in Chinese with English abstract). doi: 10.19509/j.cnki.dzkq.2020.0212
    Huang FM, Chen JW, Du Z, Yao C, Huang JS, Jiang QH, Chang ZL, Li S. 2020. Landslide susceptibility pre-diction considering regional soil erosion based on machine-learning models. ISPRS International Journal of Geo-Information, 9(6), 377. doi: 10.3390/ijgi9060377
    Huang FM, Hu SY, Yan XY, Li M, Wang JY, Li WB, Guo ZZ, Fan WY. 2022. Landslide susceptibility prediction and identification of its main environmental factors based on machine learning models. Bulletin of Geological Science and Technology, 41(2), 79–90 (in Chinese with English abstract). doi: 10.19509/j.cnki.dzkq.2021.0087
    Huang Y, Zhao L. 2018. Review on landslide susceptibility mapping using support vector machines. Catena, 165, 520–529. doi: 10.1016/j.catena.2018.03.003
    Jia YF, Wei WH, Chen W, Yang QZ, Sheng YF, Xu GL. 2023. Landslide susceptibility assessment based on the SOM-I-SVM model. Hydrogeology & Engineering Geology, 50(3), 125–137. doi: 10.16030/j.cnki.issn.1000-3665.202206041
    Kanungo DP, Arora MK, Sarkar S, Gupta RP. 2006. A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Engineering geology, 85(3–4), 347–366. doi: 10.1016/j.enggeo.2006.03.004
    Lee S, Min K. 2001. Statistical analysis of landslide susceptibility at Yongin, Korea. Environmental geology, 40(9), 1095–1113. doi: 10.1007/s002540100310
    Li JL, Ma DH, Wang W. 2016. Assessment of potential seismic landslide hazard based on evidence theory and entropy weight grey incidence. Journal of Central South University (Science and Technology), 47(5), 1730–1736 (in Chinese with English abstract). doi: 10.11817/j.issn.1672-7207.2016.05.036
    Liu HH. 2012. The assessment of geohazard danger in Wenchuan County based on RS and GIS. Geology in China, 39(1), 243–251 (in Chinese with English abstract).
    Miao WD. 2003. Time prediction study on occurring of landslides in Bailuyuan, Xi'an. Northwestern Geology, 36(4), 90–95 (in Chinese with English abstract).
    Paraskevas T, Ioanna I. 2016. Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece. Landslides, 13(2), 305–320. doi: 10.1007/s10346-015-0565-6
    Peng CYJ, Lee K, Ingersoll GM. 2002. An introduction to logistic regression analysis and reporting. The Journal of Educational Research, 96(1), 3–14. doi: 10.1080/00220670209598786
    Pham BT, Tien Bui D, Dholakia MB, Prakash I, Pham, HV. 2016. A comparative study of least square support vector machines and multiclass alternating decision trees for spatial prediction of rainfall-induced landslides in a tropical cyclones area. Geotechnical and Geological Engineering, 34, 1807–1824. doi: 10.1007/s10706-016-9990-0
    Pham BT, Tien Bui D, Indra P, Dholakia M. 2015. Landslide susceptibility assessment at a part of Uttarakhand Himalaya, India using GIS-based statistical approach of frequency ratio method. nternational Journal of Engineering Research and Technology, 4(11), 338–344. doi: 10.17577/IJERTV4IS110285
    Reichenbach P, Rossi M, Malamud BD, Mihir M, Guzzetti F. 2018. A review of statistically-based landslide susceptibility models. Earth-Science Reviews, 180, 60–91. doi: 10.1016/j.earscirev.2018.03.001
    Tien Bui D, Pradhan B, Lofman O, Revhaug I. 2012. Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and Naive Bayes models. Mathematical Problems in Engineering, 1–26. doi: 10.1155/2012/974638.
    Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I. 2016. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides, 13, 361–378. doi: 10.1007/s10346-015-0557-6
    Tsangaratos P, Ilia I, Hong H, Chen W, Xu C. 2017. Applying information theory and GIS-based quantitative methods to produce landslide susceptibility maps in Nancheng County, China. Landslides, 14(3), 1091–1111. doi: 10.1007/s10346-016-0769-4
    Wang T, Liu JM, Li ZT, Xin P, Shi JS, Wu SR. 2021. Seismic landslide hazard assessment of China and its impact on national territory spatial planning. Geology in China, 48(1), 22–39 (in Chinese with English abstract). doi: 10.12029/gc20210102
    Xia H, Yin KL, Liang X, Ma F. 2018. Landslide susceptibility assessment based on SVM-ANN Models: A case study for Wushan County in the Three Gorges Reservoir. The Chinese Journal of Geological Hazard and Control, 29(5), 13–19 (in Chinese with English abstract). doi: 10.16031/j.cnki.issn.1003-8035.2018.05.03
    Xiong XH, Wang CL, Bai YJ, Tie YB, Gao YC, Li GH. 2022. Comparison of landslide susceptibility assessment based on multiple hybrid models at county level: A case study for Puge County, Sichuan Province. The Chinese Journal of Geological Hazard and Control, 33(4), 114–124. doi: 10.16031/j.cnki.issn.1003-8035.202202052
    Yang DH, Fan W. 2015. Zoning of probable occurrence level of geological disasters based on ArcGIS——A case of Xunyang. The Chinese Journal of Geological Hazard and Control, 26(4), 82–86, 93 (in Chinese with English abstract). doi: 10.16031/j.cnki.issn.1003-8035.2015.04.14
    Zêzere JL, Pereira S, Melo R, Oliveira SC, Garcia RAC. 2017. Mapping landslide susceptibility using data-driven methods. Science of the Total Environment, 589, 250–267. doi: 10.1016/j.scitotenv.2017.02.188
    Zhang J, Yin KL, Wang JJ, Liu L, Huang FM. 2016. Study on landslide susceptibility evaluation for Wanzhou district of Three Gorges Reservoir. Chinese Journal of Rock Mechanics and Engineering, 35(2), 284–296 (in Chinese with English abstract). doi: 10.13722/j.cnki.jrme.2015.0318

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return
    x Close Forever Close