Abstract
Abstract: An accurate assessment of debris flow susceptibility is of great significance to the prevention and control of debris flow disasters in mountainous areas. In this study, Synthetic Minority Oversampling Technique (SMOTE) and multi-grained Cascade Forest (gcForest) were applied to assess the debris flow susceptibility for high accuracy. The research area was taken as the Dongchuan District, Kunming City, Yunnan Province, China, where the debris flows were prone to occur. Taking the watershed unit as the assessment unit, 15 debris flow hazard factors were preliminarily selected using multiple sources of data, such as geology, topography, and precipitation, according to the interpretation of debris flow points. The contribution rate and multicollinearity tests were performed on the initial selection factors to filter out. 13 factors were selected to build a system of disaster-predisposing factors, including the watershed lithology, average fault density, main channel bending coefficient, average river network density, land use type, average road network density, channel gradient, 24h maximum precipitation, elevation difference, melt ratio, average elevation, average slope, average NDVI. Then, the synthetic minority oversampling was used to deal with the imbalance between debris flow and non-debris flow samples, and the training data set was then constructed. At last, a multi gcForest was constructed to quantify the susceptibility of debris flow in the study area. The natural breakpoint method was selected to classify the five levels for each watershed unit, such as the very low, low, medium, high, and very high susceptibility. The prediction performance of the improved model was compared with the Back Propagation neural network (BPNN) and Random Forest (RF) models. The results show that the model accuracy was improved from 0.786 7 to 0.917 6 using the STOME oversampling technique to balance the data set, indicating the higher prediction accuracy of the model. The very low and low susceptibility areas were mainly concentrated in the eastern and western parts of the study area, whereas, the very high and high susceptibility areas of debris flow were mainly distributed on both banks of Xiaojiang River Valley and the South Bank of Jinsha River in the study area, with the most concentrated distribution in the middle and north of Tuobuka, Wulong, Tongdu Street, the north of Awang, Yinmin and the north of Shekuai, where the geological environment was fragile and the high risk. The medium susceptibility area was mainly distributed around the very high and high susceptibility areas, particularly in the upper reaches of the Xiaoqing River in Hongtudi. There were the excellent accuracy and stability in the three assessment models of debris-flow susceptibility in the mountainous areas combined with watershed units, in which the gcForest the Area under Curve (AUC) value of the Receiver-Operating Characteristic (ROC) and Accuracy (ACC) value reached 0.917 6 and 0.812 5 respectively. The AUC and ACC values of gcForest were higher than those of the BP neural network and RF model, indicating a higher performance. Correspondingly, the improved model can be used to more accurately evaluate the susceptibility of mudslides. The finding can provide a scientific basis for disaster prevention and mitigation in mountainous areas.