Abstract:
Fresh corn is a high value-added agricultural product with high nutritional value and is deeply loved by consumers. Accurate water management is crucial for the yield and quality of fresh corn. This study was based on the online acquisition of water requirement and meteorological data for two consecutive crops of fresh corn using a large weighing lysimeter and meteorological monitoring equipment. Three machine learning algorithms, categorical boosting (CatBoost), long short-term memory (LSTM,) and random forest (RF), were used to construct a prediction model for fresh corn water requirement. The shapley additive explanations (SHAP) method was used to analyze the contribution of each feature factor to the prediction results in depth. Firstly, the study combined the boruta algorithm (Boruta) and least absolute shrinkage and selection operator (LASSO) feature selection methods to identify the key factors affecting the water requirements of fresh corn. Subsequently, the common meteorological factors selected by both feature selection methods were used as input variables for the model. The study then established CatBoost, LSTM, and RF models for predicting the water requirements of fresh corn. Concurrently, the SHAP analysis method was employed to perform an interpretability analysis of the prediction results, clarifying the contribution of meteorological features to the prediction outcomes and enhancing the transparency and reliability of the model. The research results indicate that through the Boruta and LASSO methods, it was discovered that daily mean temperature (
Tave), daily maximum temperature (
Tmax), daily minimum temperature (
Tmin), daily average relative humidity (RH), daily average wind speed (
Wave), daily average atmospheric pressure (P
0), and accumulated daily sunshine hours (SD) are key factors affecting the water requirements of fresh corn. The coefficient of determination (R²), mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE) of the CatBoost prediction model are
0.9749,
0.0189,
0.0006, and
0.0251, respectively, indicating significantly higher prediction accuracy compared to LSTM and RF. Considering its predictive efficiency, the CatBoost model has the highest number of training epochs (30,000) and the shortest training time (23.93 seconds), making it the optimal prediction model. Through SHAP analysis of the CatBoost model, it was found that daily minimum temperature, daily average temperature, and daily cumulative sunshine hours are key factors affecting the water demand of fresh corn, and all have a positive impact on the prediction results. Further local interpretability analysis revealed that the same meteorological features exhibit variations in their impact on water requirement prediction outcomes across different samples. The interaction between feature changes also affects the prediction results. This study conducted an interaction analysis on the three features (
Tmin,
Tave, SD) that contribute the most to the prediction results and found that the accumulated sunshine hours increase with rising air temperature, and the water requirement for fresh corn also increases accordingly. The study can provide a reference basis for precise water management of fresh corn.