Abstract:
Aiming at the imperfections in traditional crop yield estimation methods of model generalization and the question about lack of temporal and spatial features, this study took machine-harvested cotton as the research object, combined with Unmanned Aerial Vehicle (UAV) remote sensing platform and deep learning technology to carry out multi-period remote sensing observation and yield estimation of cotton. Taking the images of the cotton seedling stage, bud stage, and flowering stage as the time series data set, the Convolutional Neural Network and Bidirectional Long Short-Term Memory (CNN-BiLSTM) model combined with convolution neural network and bidirectional long short term memory network was constructed to predict cotton yield, which improved the feature extraction ability of time dimension and spatial dimension, and verified the performance of convolution neural network and bidirectional long short term memory network respectively, as well as the impact of different depth structures on yield estimation. The results showed that in the verification experiments of Long Short-Term Memory (LSTM) and BiLSTM network structure, the BiLSTM1 model with network depth of 1 layer had the best effect, with the determination coefficient of 0.851, root mean square error of 161.911 g (the input, output and validation of all models are based on 2.3 m×2.3 m quadrat data), and average absolute percentage error of 7.304%. The three evaluation indexes were higher than other comparable models. Besides, the accuracy of the LSTM2 model with 2 hidden layers was the second, the accuracy of the single-layer LSTM1 model was the third, and the determination coefficients were 0.844 and 0.834 respectively. And the accuracy of LSTM4, BiLSTM2, and BiLSTM4 decreased in turn. The results showed that only increased the depth of the network couldn't improve model prediction accuracy. In the verification experiment of CNN-BiLSTM model based on BiLSTM1, the CNN1-BiLSTM model with 1 convolution layer had the worst effect, its determination coefficient was 0.885, the root mean square error was 147.167 g and the average absolute percentage error was 6.711%, and the three evaluation indexes were lower than other models. It showed that when the convolution layer of the CNN network was less, the shallow layer features extracted by the CNN network couldn't improve the accuracy of the model, even caused interference. However, with the increase of the convolution layer, the prediction accuracy was gradually improved. When the volume layer increases to 10 layers, the determination coefficient of the CNN10-BiLSTM model reached 0.857, and the average absolute percentage error decreased to 7.256%. When the convolution layers were 14, the performance index reached the peak and was obviously better than the precision of the BiLSTM model and LSTM model. The coefficient of determination was 0.885, the root mean square error was 147.167 g, and the average absolute percentage error was 6.711%. However, when the number of convolution layers exceeds 14, the increase of convolution layers of CNN didn't help the performance improvement of the model, and it would decrease. For example, the decision coefficient of the CNN20-BiLSTM model was only 0.870. In conclusion, based on the multi-phase cotton images collected by the UAV remote sensing platform, the CNN-BiLSTM model adopted in this study could effectively extract the characteristics of spatial dimension and time dimension, and achieve the accurate prediction of cotton yield at time-series scale and cotton field scale, which could provide a reference for the research on crop yield estimation based on time series data.