长时间序列气象数据结合随机森林法早期预测冬小麦产量

    Early prediction of winter wheat yield with long time series meteorological data and random forest method

    • 摘要: 冬小麦生育早期的产量预测对于制定冬小麦整个生长期的精准管理决策具有重要参考意义。该文基于随机森林算法,采用1990-2015年河南省小麦平均拔节期至平均抽穗期地面观测气象数据与统计产量数据,分别提取不同穗分化期的温湿度、降水等47个气象要素和小麦种植区经纬度、高程3个空间要素,共计50个参数作为特征变量集,以实际单产、气象产量和相对气象产量分别作为目标变量,构建多种变量组合模型对冬小麦产量进行回归预测,并结合袋外数据重要性结果对产量影响因子进行分析。研究结果表明:1)使用气象产量和相对气象产量作为目标变量建模的预测效果优于单产模型,决定系数R2均达到0.8以上,气象产量的平均绝对误差(mean absolute error,MAE)和均方根误差(root mean square error,RMSE)分别为415和558 kg/hm2,相对气象产量的MAE和RMSE分别为0.07和0.09;2)相较于气象特征,空间特征在产量预测中起决定性作用,且小花分化期以及抽穗开花期的气象特征产量预测精度高于其他穗分化期;3)在气象特征中,利用袋外数据变量重要性得出平均温度、最低温度、负积温、最高温度在不同生育阶段对产量的影响程度。该研究结果为冬小麦生育早期产量预测提供了新的思路和方法。

       

      Abstract: Abstract: Early prediction of winter wheat yield is of great significance for the formulation of precise management decisions for the whole growth period of winter wheat. The yield of winter wheat is affected by production technology level and climatic conditions. This study analyzed the feasibility of early prediction of winter wheat yield with long time series meteorological data and random forest method in Henan Province. Winter wheat was planted in a total of 106 counties (cities) in Henan province. Based on the ground observation meteorological data and the winter wheat statistical yield data from the year of 1990 to 2015, we extracted 47 climatic factors such as temperature, humidity and precipitation in different growth stages from wheat jointing to heading stage, and 3 spatial factors of latitude, longitude and elevation. A total of 50 parameters were used as a set of feature variables. The actual yield, meteorological yield and relative meteorological yield were used as the target variables respectively, and a random forest yield prediction model with multiple variables was constructed. The data from the year of 1990 to 2009 were used as training samples to construct the model and the forests constructed were validated with data from the year of 2010 to 2015. The yield impact factors were analyzed by combining the data importance results outside the bag. The results showed that: 1) The prediction results by using meteorological yield and relative meteorological yield as the target variables were better than the yield model. For the meteorological yield and relative meteorological yield models, the values of determination coefficient R2 were both above 0.8, the values of mean absolute error (MAE) and root mean square error (RMSE) of meteorological yield were 415 and 558 kg/hm2, respectively, and the values of MAE and RMSE of relative meteorological yield were 0.07 and 0.09, respectively; 2) The spatial characteristics played an important role in the improving the random forest yield model. However, if the model included only spatial parameters, the predicted values were horizontally distributed along 1:1 line and the different yields in the same region by using random forest algorithm were predicted as the same values. The values far from 1:1 line might be affected by meteorological factors. Therefore, on this basis, adding meteorological features improved the prediction accuracy with smaller deviations, higher R2 (0.88), and smaller MAE and RMSE (0.06 and 0.08). 3) The model prediction was also affected by crop growing stages. The accuracy based on the meteorological features of winter wheat florets differentiation and heading and flowering stage was higher than the other spike differentiation periods, indicating that the environmental changes during this period have a greater impact on the final yield; The predicted results at the late drug interval had the larger deviation from the actual yield. It was because the meteorological factors had strong correlation and it weakened the impacts of spatial characteristics. 4) Based on the importance of outside the bag data, In the meteorological features, the average temperature and minimum temperature of winter wheat floret differentiation period, the spatial characteristics parameters were important. In addition, the negative accumulated temperature from the jointing to heading stage, and the maximum temperature at heading and flowering stage had great influence on yield. During the model establishment, we didn't differentiate disaster from non-disaster year because the sample sizes were small. However, during the model validation, the data were from both normal and disaster years, which could ensure the reliability of the prediction model. Thus, the winter wheat yield prediction based on random forest should consider both spatial and meteorological characteristics parameters. The results of this study provide new ideas and methods for early prediction of winter wheat yield.

       

    /

    返回文章
    返回