基于SHAP重要性排序和机器学习算法的灌区渠道调度流量预测

    Irrigation district channel dispatch flow prediction based on SHAP importance ranking and machine learning algorithm

    • 摘要: 渠道泄水闸能够快速排除灌区入渠洪水,避免渠道漫顶。研究以淠史杭灌区灌口集泄水闸为例,以闸门调度流量为目标变量,以不同时段过去和未来降雨量、泄水闸闸上实时水位及其变化量为特征变量,比较8种机器学习算法的预测精度,同时采用shapley additive explanations(SHAP)法分析特征变量重要性。结果表明:1)集成学习算法预测评价指标优于传统回归算法,8种机器学习算法中随机森林回归(random forest regression, RFR)算法预测精度最高(训练集均方根误差、平均绝对误差、均方误差及决定系数分别为 0.146 m3/s、0.094 m3/s、0.021 m3/s、0.976;测试集分别为0.306 m3/s、0.197 m3/s、0.093 m3/s、0.931);2)采用SHAP法确定的特征变量重要性排序表明灌口集泄水闸闸上水位对于泄水闸调度流量的预测结果影响最大,占特征重要性值总和的34.6%;3)以过去6 h降雨量、过去9 h降雨量、未来6 h降雨量、灌口集泄水闸闸上水位作为输入变量的RFR算法预测灌口集泄水闸调度流量效果最佳,训练集均方根误差、平均绝对误差、均方误差及决定系数分别为0.126 m3/s、0.080 m3/s、0.016 m3/s、0.982;测试集分别为0.263 m3/s、0.164 m3/s、0.069 m3/s、0.950,研究结果对灌区防洪调度决策具有重要参考价值。

       

      Abstract: The channel sluice can quickly remove the flood into the canal in the irrigation area. In order to provide a simple and efficient method for flood control scheduling decision of drainage sluice in irrigation area, this study took Pishihang Irrigation District as an example to establish a prediction model with dispatched flow as the target variable and 10 characteristic variables as independent variables. The 10 variables were the water level and rainfall of drainage sluice at irrigation mouth: rainfall in the past 1 hour, 2 hour, 3 hour, 6 hour, and 9 hour and rainfall in the future 1 hour, 3 hour and 6 hour, water level on the gates of the Guan Kouji drainage gate, difference in water level at the gate in the past half hour.The prediction accuracy of 8 machine learning algorithms was compared to pick the best algorithm. The Shapley Additive exPlanations(SHAP)method was used to analyze the importance of 10 groups of variables, and the influence weights of different variables on the prediction results were obtained. By comparing the prediction error indicators of the optimal algorithm under different variable combinations, the optimal variable combinations were selected, and the accuracy of the algorithm was further optimized to determine the final scheduling flow decision model. The results showed that: 1) The integrated learning algorithm was better than the traditional regression algorithm in predicting the evaluation index. The order of prediction accuracy of ensemble learning algorithms was as follows: random forest regression (RFR)>extrme gradient boosting regression (XGR) >adapative bossting regression (ABR)>spoort vector regression (SVR), and Bagging had the highest accuracy in the three categories of ensemble learning algorithms. RFR had the highest prediction accuracy among the 8 machine learning algorithms (the root mean square error, mean absolute error, mean square error and determination coefficient of the training set were 0.146, 0.094, 0.021 m3/s and 0.976, respectively. The root-mean-square error, mean absolute error and mean square error of the test set were 0.306, 0.197, 0.093 m3/s and 0.931, respectively); 2) The importance values of characteristic variables determined by SHAP method were as follows in descending order: water level on the gates of the Guan Kouji drainage gate, rainfall in the past 9 hour, rainfall in the future 6 hour, rainfall in the past 6 hour, rainfall in the past 3 hour, rainfall in the past 2 hour, rainfall in the future 1 hour, rainfall in the past 1 hour, difference in water level at the gate in the past half hour and rainfall in the future 3 hour. The water level on the gates of the Guan Kouji drainage gate had the greatest influence on the prediction results of the drainage flow, accounting for 34.6% of the total importance values of the features. The total importance values of the rainfall features in the past period were 0.473, and the total importance values of the rainfall features in the future period were 0.287. The influence degree of the rainfall in the past period was greater than that of the rainfall in the future. 3) The RFR algorithm with the input variables of past 6 hour rainfall, past 9 hour rainfall, future 6 hour rainfall and the water level on the gates of the Guan Kouji drainage gate was the best to predict the dispatching flow of the sluice gate (The model error indexes were as follows: root mean square error, mean absolute error, mean square error and determination coefficient of training set are 0.126, 0.080, 0.016 m3/s and 0.982, respectively; The root mean square error, mean absolute error, mean square error and determination coefficient of the test set were 0.263, 0.164, 0.069 m3/s and 0.950, respectively. The determination coefficients of the training set and the test set were increased by 0.6% and 2.0%, respectively, compared with all the characteristic variables. The root-mean-square error, mean absolute error and mean square error were reduced by 13.7%, 14.9%, 23.8%, 14.1%, 16.3% and 25.8%, respectively, compared with all the characteristic variables. It can be seen that variable selection has a significant impact on the prediction accuracy. This study avoids the multi-source data collection and complex operation required by the coupling mechanism model, and provides technical support for the irrigation district management agency to scientifically dispatch the sluice of each channel. The research results were of great significance for realizing the modernization of irrigation district.

       

    /

    返回文章
    返回