集成学习结合多源数据预测河南省冬小麦单产

    Predicting winter wheat yield per uint area in Henan Province of China using ensemble learning and multi-source data

    • 摘要: 为探讨基于多源数据和集成学习算法预测冬小麦单产的可行性并确定冬小麦单产预测的最佳时间窗,该研究在河南省冬小麦生长季内划分28个不同的时间窗,使用8种不同的机器学习算法及基于Stacking的集成学习算法,利用2003—2018年的多种遥感指数数据、气象数据进行训练并预测2019—2021年单产。结果表明:引入日光诱导叶绿素荧光(solar-induced chlorophyll fluorescence,SIF)特征可以提升河南省冬小麦单产的预测效果; 12月至次年5月为机器学习算法预测冬小麦单产的最佳时间窗;Stacking集成学习算法比其他单机器学习算法更适用于河南省县级尺度冬小麦单产预测,预测结果的决定系数为0.816,均方根误差和平均绝对误差分别为580.36 和476.01 kg/hm2;河南省冬小麦实际单产的空间分布呈西低东高的趋势,预测的单产分布特征与实际单产分布特征相当。研究结果可为冬小麦单产预测提供一种新的方法,也为农作物单产预测模型构建提供新的思路。

       

      Abstract: Wheat, as one of the three major food crops in the world, plays a crucial role in the global food supply system. It is very necessary to accurately and timely predict the winter wheat yield for national food security. Spectral information obtained from remote sensing satellites can be expected to better reflect the growth status of crops, and then predict the yield of large-scale crops. Machine learning has also been widely applied in crop classification and yield prediction in recent years, due to its strong capabilities in data mining and analysis. However, the input factors of machine learning are mostly traditional variables, such as soil moisture, meteorology, and land use. It is a high demand to introduce new variables as the input factors for machine learning, in order to improve the accuracy of crop yield estimation. Solar-induced chlorophyll fluorescence (SIF, a byproduct of vegetation photosynthesis) has gradually become a new variable in crop yield prediction, which is sensitive to water and heat stress during crop growth. The data from different time windows can be used to determine the optimal time window and the high accuracy of yield prediction of winter wheat. Meanwhile, the limited data can be used to provide reliable predictions before the winter wheat harvest. In this study, an ideal region was taken to predict the wheat yield per unit area in Henan Province in China. The yield of winter wheat was then predicted using multi-source remote sensing data and ensemble learning. The growth season of winter wheat was divided into 28 time windows. SIF feature variables were introduced to normalize the vegetation index, water index, leaf area index, monthly average temperature, and total monthly precision. Multiple remote sensing indicators and meteorological data were collected from 2003 to 2018, in order to train the yield of winter wheat from 2019 to 2021. Stacking-based ensemble learning was used to fit eight machine learning models, and then combine their predictions. The prediction errors were then reduced to improve the accuracy and generalization of the model. The results show that: (1) SIF features improved the prediction of winter wheat yield; (2) The optimal time window of machine learning was in the period of December to the following May; (3) Compared with the rest, the Stacking-based ensemble learning performed the best, where the coefficient of determination was 0.816, the root mean squared error was 580.36 kg/hm2, and mean absolute error was 476.01 kg/hm2; (4) The spatial distribution of the predicted yield was similar to the actual one, with a trend of low in the west and high in the east. This finding can also provide new ideas for predicting crop yield.

       

    /

    返回文章
    返回