Predicting winter wheat yield per uint area in Henan Province of China using ensemble learning and multi-source data
-
Graphical Abstract
-
Abstract
Wheat, as one of the three major food crops in the world, plays a crucial role in the global food supply system. It is very necessary to accurately and timely predict the winter wheat yield for national food security. Spectral information obtained from remote sensing satellites can be expected to better reflect the growth status of crops, and then predict the yield of large-scale crops. Machine learning has also been widely applied in crop classification and yield prediction in recent years, due to its strong capabilities in data mining and analysis. However, the input factors of machine learning are mostly traditional variables, such as soil moisture, meteorology, and land use. It is a high demand to introduce new variables as the input factors for machine learning, in order to improve the accuracy of crop yield estimation. Solar-induced chlorophyll fluorescence (SIF, a byproduct of vegetation photosynthesis) has gradually become a new variable in crop yield prediction, which is sensitive to water and heat stress during crop growth. The data from different time windows can be used to determine the optimal time window and the high accuracy of yield prediction of winter wheat. Meanwhile, the limited data can be used to provide reliable predictions before the winter wheat harvest. In this study, an ideal region was taken to predict the wheat yield per unit area in Henan Province in China. The yield of winter wheat was then predicted using multi-source remote sensing data and ensemble learning. The growth season of winter wheat was divided into 28 time windows. SIF feature variables were introduced to normalize the vegetation index, water index, leaf area index, monthly average temperature, and total monthly precision. Multiple remote sensing indicators and meteorological data were collected from 2003 to 2018, in order to train the yield of winter wheat from 2019 to 2021. Stacking-based ensemble learning was used to fit eight machine learning models, and then combine their predictions. The prediction errors were then reduced to improve the accuracy and generalization of the model. The results show that: (1) SIF features improved the prediction of winter wheat yield; (2) The optimal time window of machine learning was in the period of December to the following May; (3) Compared with the rest, the Stacking-based ensemble learning performed the best, where the coefficient of determination was 0.816, the root mean squared error was 580.36 kg/hm2, and mean absolute error was 476.01 kg/hm2; (4) The spatial distribution of the predicted yield was similar to the actual one, with a trend of low in the west and high in the east. This finding can also provide new ideas for predicting crop yield.
-
-