基于多特征选择的鲜食玉米需水量预测及可解释性分析

    Prediction and interpretability analysis of fresh corn water demand based on multi feature selection

    • 摘要: 鲜食玉米是一种高附加值农产品,精确的需水量预测对于科学灌溉保障其产量和品质至关重要。该研究基于大型称量式蒸渗仪与气象采集设备在线获取连续2茬次的鲜食玉米需水量及气象数据,联合布尔塔算法(boruta algorithm, Boruta)与最小绝对收缩和选择算法(least absolute shrinkage and selection operator, LASSO)筛选了影响鲜食玉米需水量变化的主要气象特征,并建立梯度提升决策树(categorical boosting, CatBoost)、长短期记忆网络(long short-term memory, LSTM)、随机森林(random forest, RF)3种鲜食玉米需水量预测模型,最后采用shapley additive explanations(SHAP)方法对预测模型进行全局和局部的事后可解释性分析。结果表明:通过Boruta与LASSO联合的特征选择方法,明确了日平均温度、日最高温度、日最低温度、日平均相对湿度、日平均风速、日平均气压、日累计日照时数是影响鲜食玉米需水量的潜在因子,CatBoost模型对鲜食玉米需水量的预测效果最佳,其平均绝对误差(0.0189 )、均方误差(0.0006)和均方根误差(0.0552)均最小,并且用时最短(23.93 s)。进一步根据可解释性分析发现,日最低温度、日平均温度、日累计日照时数是影响鲜食玉米需水量的关键因子,随着空气温度和日照时数的增加,鲜食玉米的需水量也随之升高。研究成果可为北京地区鲜食玉米的精确水分管理提供参考。

       

      Abstract: Fresh corn is a high value-added agricultural product with high nutritional value and is deeply loved by consumers. Accurate water management is crucial for the yield and quality of fresh corn. This study was based on the online acquisition of water requirement and meteorological data for two consecutive crops of fresh corn using a large weighing lysimeter and meteorological monitoring equipment. Three machine learning algorithms, categorical boosting (CatBoost), long short-term memory (LSTM,) and random forest (RF), were used to construct a prediction model for fresh corn water requirement. The shapley additive explanations (SHAP) method was used to analyze the contribution of each feature factor to the prediction results in depth. Firstly, the study combined the boruta algorithm (Boruta) and least absolute shrinkage and selection operator (LASSO) feature selection methods to identify the key factors affecting the water requirements of fresh corn. Subsequently, the common meteorological factors selected by both feature selection methods were used as input variables for the model. The study then established CatBoost, LSTM, and RF models for predicting the water requirements of fresh corn. Concurrently, the SHAP analysis method was employed to perform an interpretability analysis of the prediction results, clarifying the contribution of meteorological features to the prediction outcomes and enhancing the transparency and reliability of the model. The research results indicate that through the Boruta and LASSO methods, it was discovered that daily mean temperature (Tave), daily maximum temperature (Tmax), daily minimum temperature (Tmin), daily average relative humidity (RH), daily average wind speed (Wave), daily average atmospheric pressure (P0), and accumulated daily sunshine hours (SD) are key factors affecting the water requirements of fresh corn. The coefficient of determination (R²), mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE) of the CatBoost prediction model are 0.9749, 0.0189, 0.0006, and 0.0251, respectively, indicating significantly higher prediction accuracy compared to LSTM and RF. Considering its predictive efficiency, the CatBoost model has the highest number of training epochs (30,000) and the shortest training time (23.93 seconds), making it the optimal prediction model. Through SHAP analysis of the CatBoost model, it was found that daily minimum temperature, daily average temperature, and daily cumulative sunshine hours are key factors affecting the water demand of fresh corn, and all have a positive impact on the prediction results. Further local interpretability analysis revealed that the same meteorological features exhibit variations in their impact on water requirement prediction outcomes across different samples. The interaction between feature changes also affects the prediction results. This study conducted an interaction analysis on the three features (Tmin, Tave, SD) that contribute the most to the prediction results and found that the accumulated sunshine hours increase with rising air temperature, and the water requirement for fresh corn also increases accordingly. The study can provide a reference basis for precise water management of fresh corn.

       

    /

    返回文章
    返回