Abstract:
Abstract: Early prediction of winter wheat yield is of great significance for the formulation of precise management decisions for the whole growth period of winter wheat. The yield of winter wheat is affected by production technology level and climatic conditions. This study analyzed the feasibility of early prediction of winter wheat yield with long time series meteorological data and random forest method in Henan Province. Winter wheat was planted in a total of 106 counties (cities) in Henan province. Based on the ground observation meteorological data and the winter wheat statistical yield data from the year of 1990 to 2015, we extracted 47 climatic factors such as temperature, humidity and precipitation in different growth stages from wheat jointing to heading stage, and 3 spatial factors of latitude, longitude and elevation. A total of 50 parameters were used as a set of feature variables. The actual yield, meteorological yield and relative meteorological yield were used as the target variables respectively, and a random forest yield prediction model with multiple variables was constructed. The data from the year of 1990 to 2009 were used as training samples to construct the model and the forests constructed were validated with data from the year of 2010 to 2015. The yield impact factors were analyzed by combining the data importance results outside the bag. The results showed that: 1) The prediction results by using meteorological yield and relative meteorological yield as the target variables were better than the yield model. For the meteorological yield and relative meteorological yield models, the values of determination coefficient R2 were both above 0.8, the values of mean absolute error (MAE) and root mean square error (RMSE) of meteorological yield were 415 and 558 kg/hm2, respectively, and the values of MAE and RMSE of relative meteorological yield were 0.07 and 0.09, respectively; 2) The spatial characteristics played an important role in the improving the random forest yield model. However, if the model included only spatial parameters, the predicted values were horizontally distributed along 1:1 line and the different yields in the same region by using random forest algorithm were predicted as the same values. The values far from 1:1 line might be affected by meteorological factors. Therefore, on this basis, adding meteorological features improved the prediction accuracy with smaller deviations, higher R2 (0.88), and smaller MAE and RMSE (0.06 and 0.08). 3) The model prediction was also affected by crop growing stages. The accuracy based on the meteorological features of winter wheat florets differentiation and heading and flowering stage was higher than the other spike differentiation periods, indicating that the environmental changes during this period have a greater impact on the final yield; The predicted results at the late drug interval had the larger deviation from the actual yield. It was because the meteorological factors had strong correlation and it weakened the impacts of spatial characteristics. 4) Based on the importance of outside the bag data, In the meteorological features, the average temperature and minimum temperature of winter wheat floret differentiation period, the spatial characteristics parameters were important. In addition, the negative accumulated temperature from the jointing to heading stage, and the maximum temperature at heading and flowering stage had great influence on yield. During the model establishment, we didn't differentiate disaster from non-disaster year because the sample sizes were small. However, during the model validation, the data were from both normal and disaster years, which could ensure the reliability of the prediction model. Thus, the winter wheat yield prediction based on random forest should consider both spatial and meteorological characteristics parameters. The results of this study provide new ideas and methods for early prediction of winter wheat yield.