Abstract:
Extreme rainstorm-triggered shallow landslides can often occur in highly vegetation-covered areas, due mainly to the synergistic interactions among geological, vegetation, and meteorological factors. In this study, a landslide prediction model was constructed with high accuracy, in order to reveal the influence of vegetation factors on shallow landslides. Taking the mountain forest in Huaying city as study site, various vegetation factors were selected, including stock volume, stand density, average tree age, stand types, and green-red vegetation index (GRVI), and combined with topographic and geological factors (engineering geology rock group, distance to faults, distance to river, elevation, coefficient of elevation variation, slope, slope variation, relief degree of land surface, surface curvature, section curvature, aspect, and soil thickness). According to Boruta's importance and multicollinearity analysis, five kinds of shallow landslide prediction models were built using machine learning techniques, including the Logistic regression, Generalized additive model, random forest (RF), Support vector machine, and artificial neural network model. The prediction accuracy of the five models was evaluated by sensitivity, specificity, accuracy, and AUC values. Coupling with the previous records of landslide points, the prediction models were validated to determine the vegetation characteristics of high-risk areas in Huaying mountain forests. The research demonstrated that: 1) The susceptibility of shallow landslides was primarily influenced by the engineering geological rock groups, distance from rivers, distance from faults, stand types, average age, and stock volume. There was relatively little influence of environmental and vegetation factors on the susceptibility of shallow landslides. 2) The combination of different factors shared a great impact on the accuracy of the model, in terms of the vegetation factors (stand density, average age, and stock volume). The prediction accuracies of the five models were improved significantly; All factors were only used in the specific models, indicating no factors commonly suitable for all five models. 3) The RF62 model was achieved with the highest prediction accuracy. The AUC value, sensitivity, specificity, and accuracy of the RF62 model were 0.96, 0.83, 0.93, and 0.86, respectively. The second precision model was ANN53, where the AUC value, sensitivity, specificity, and accuracy were 0.926, 0.80, 0.79, and 0.79, respectively. The third prediction accuracy was the support vector machine model, where the AUC value, sensitivity, specificity, and accuracy were 0.90, 0.82, 0.73, and 0.77, respectively. The fourth prediction accuracy was LOGIT325, where the AUC value, sensitivity, specificity, and accuracy were 0.876, 0.83, 0.72, and 0.77, respectively. The worst accuracy was obtained from the GAM597 model, where the AUC value, sensitivity, specificity, and accuracy were 0.87, 0.82, 0.73, and 0.77, respectively. 4) The RF model performed the most accurate to predict the landslides, with 95.05% accuracy and coverage of 25.31 km
2 within the highly susceptible areas; artificial neural network model, the support vector machine model, generalized additive model, and logistic regression were followed with 78.57%, 69.78%, 68.13%, and 67.58% accuracy and coverage of 35.43, 22.02, 26.26 and 26.27 km
2 within the highly susceptible areas, respectively; 5) The primary vegetation with shallow landslides was characterized by the low density (1000-1500 plants/hm
2), high storage volume (>80m3/hm2), and advanced age (>30 a). The findings can provide scientific decision-making and technical assistance for early warning, prevention, and control of rainstorm-induced landslides in high vegetation cover areas of China.