Abstract:
Wheat is one of the most important food crops, of which the annual consumption reaches 750 million tons in the world. However, a timely and accurate estimation of wheat production has been a high demand for food security, as the higher grain supply with the ever-increasing population against climate change. In this study, a wheat ear counting network (WECnet) was constructed to accurately estimate the wheat density using the Unmanned Aerial Vehicle (UAV) images. A variety of wheat images were collected from many countries for training. The training set was then filtered and enhanced to ensure the diversity of wheat ears. Four methods were finally selected to verify the performance of WECnet. Among them, a rectangular box was used to mark the position of the target, indicating more intuitive data in the target detection. Furthermore, an end-to-end method was adopted in the CSRnet suitable for the crowd counting and high-quality generation of density map, particularly easy to train and extend the receptive field using the hole convolution. The overall counting performance of the density map was better than the previous network, where there often occurred to miss the dense and seriously occluded targets. In the selection of positive samples, a single target was output with the multiple predicted targets in the post-processing of target detection. In the density map counting, the multiple columns were used in the MCNN model to train separately, where the larger parameters failed to the different sizes of targets, leading to the difficulty to train. Therefore, the CSRnet was improved to deal with these issues, according to the characteristics of wheat. In the front end of the network, the first 12 layers of VGG19 model were used for the feature extraction, where the context semantic features were fused to fully extract the feature information of wheat ear. A back-end network used the convolution with the different void ratios to enlarge the receptive field and high-quality density map output. Additionally, the model was trained to verify the transferability and universality using the global wheat dataset, further to count the wheat field images taken by UAV in the two places. The experiments showed that the determination coefficient, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) of the training model in the global wheat dataset reached 0.95, 6.1, and 4.78, respectively, which were 4.4%, 13.2%, and 9.8% higher than those of the original population counting network. In the counting of UAV images, the determination coefficient of the optimal model was 0.886, and the total estimate number of 3 880 ears from the 46 images was 3 871, where the error rate was only 0.23%, indicating better performance than before. The average counting time of a single wheat image was 32 ms, indicating an excellent counting speed and accuracy. Consequently, the universal prediction model of field wheat density can also provide a potential data reference for the accurate counting and density prediction of the UAV wheat image.