Abstract
The channel sluice can quickly remove the flood into the canal in the irrigation area. In order to provide a simple and efficient method for flood control scheduling decision of drainage sluice in irrigation area, this study took Pishihang Irrigation District as an example to establish a prediction model with dispatched flow as the target variable and 10 characteristic variables as independent variables. The 10 variables were the water level and rainfall of drainage sluice at irrigation mouth: rainfall in the past 1 hour, 2 hour, 3 hour, 6 hour, and 9 hour and rainfall in the future 1 hour, 3 hour and 6 hour, water level on the gates of the Guan Kouji drainage gate, difference in water level at the gate in the past half hour.The prediction accuracy of 8 machine learning algorithms was compared to pick the best algorithm. The Shapley Additive exPlanations(SHAP)method was used to analyze the importance of 10 groups of variables, and the influence weights of different variables on the prediction results were obtained. By comparing the prediction error indicators of the optimal algorithm under different variable combinations, the optimal variable combinations were selected, and the accuracy of the algorithm was further optimized to determine the final scheduling flow decision model. The results showed that: 1) The integrated learning algorithm was better than the traditional regression algorithm in predicting the evaluation index. The order of prediction accuracy of ensemble learning algorithms was as follows: random forest regression (RFR)>extrme gradient boosting regression (XGR) >adapative bossting regression (ABR)>spoort vector regression (SVR), and Bagging had the highest accuracy in the three categories of ensemble learning algorithms. RFR had the highest prediction accuracy among the 8 machine learning algorithms (the root mean square error, mean absolute error, mean square error and determination coefficient of the training set were 0.146, 0.094, 0.021 m3/s and 0.976, respectively. The root-mean-square error, mean absolute error and mean square error of the test set were 0.306, 0.197, 0.093 m3/s and 0.931, respectively); 2) The importance values of characteristic variables determined by SHAP method were as follows in descending order: water level on the gates of the Guan Kouji drainage gate, rainfall in the past 9 hour, rainfall in the future 6 hour, rainfall in the past 6 hour, rainfall in the past 3 hour, rainfall in the past 2 hour, rainfall in the future 1 hour, rainfall in the past 1 hour, difference in water level at the gate in the past half hour and rainfall in the future 3 hour. The water level on the gates of the Guan Kouji drainage gate had the greatest influence on the prediction results of the drainage flow, accounting for 34.6% of the total importance values of the features. The total importance values of the rainfall features in the past period were 0.473, and the total importance values of the rainfall features in the future period were 0.287. The influence degree of the rainfall in the past period was greater than that of the rainfall in the future. 3) The RFR algorithm with the input variables of past 6 hour rainfall, past 9 hour rainfall, future 6 hour rainfall and the water level on the gates of the Guan Kouji drainage gate was the best to predict the dispatching flow of the sluice gate (The model error indexes were as follows: root mean square error, mean absolute error, mean square error and determination coefficient of training set are 0.126, 0.080, 0.016 m3/s and 0.982, respectively; The root mean square error, mean absolute error, mean square error and determination coefficient of the test set were 0.263, 0.164, 0.069 m3/s and 0.950, respectively. The determination coefficients of the training set and the test set were increased by 0.6% and 2.0%, respectively, compared with all the characteristic variables. The root-mean-square error, mean absolute error and mean square error were reduced by 13.7%, 14.9%, 23.8%, 14.1%, 16.3% and 25.8%, respectively, compared with all the characteristic variables. It can be seen that variable selection has a significant impact on the prediction accuracy. This study avoids the multi-source data collection and complex operation required by the coupling mechanism model, and provides technical support for the irrigation district management agency to scientifically dispatch the sluice of each channel. The research results were of great significance for realizing the modernization of irrigation district.