Abstract:
Abstract: The number of spikes per unit area has been one of the main factors to determine the wheat yield. Rapid and accurate acquisition of the number of spikes per unit area is of great importance for the breeding and cultivation in agricultural production. Fortunately, the high-resolution images of wheat spikes can be analyzed by the pre-trained artificial intelligence models to extract the number of spikes per unit area, particularly with the rapid development of deep learning. The consistent data can also be obtained to independently extract the feature, due to the strong learning ability of deep learning at present. In this study, a combined smartphone and server system was proposed to measure the number of wheat spikes. A Convolutional Block Attention Module (CBAM) and YOLOv5 were combined as the core of the CBAM-YOLOv5 model. Among them, the YOLOv5 network structure provided an excellent balance between the detection speed and accuracy for the small and dense targets, suitable for counting the number of wheat spikes. Since the channel and spatial attention modules were contained in the CBAM, the features were processed along both channel and spatial dimensions. The feature representation of targets was then much clearer to identify the overlapping or obscured wheat spikes. The specific procedure was as follows: 1) To manually annotate the self-photographed Wheat Spike Detection (WSD) dataset and the publicly available Global Wheat Head Detection (GWHD) dataset on the web, including 176 images as the training set, 22 images as the validation set, and 22 images as the test set. The generalization ability of the model was improved to introduce the GWHD dataset. 2) The CBAM was added at the neck end of the YOLOv5 network structure in the improved CBAM-YOLOv5 model. The input image sizes of the model were set as 640, 960, and 1 280 pixels. A comparison was then made to obtain the optimal training parameters. 3) The CBAM-YOLOv5, YOLOv5, YOLOv4, and Faster RCNN were trained with the optimal parameters to compare the performance of different network structures. 4) The spikes counting system was developed using the client-server model. Specifically, the images of wheat spikes were taken by smartphones and then uploaded to the server. The CBAM-YOLOv5 model on the server was used to recognize the images. After that, the counting data was then returned to the smartphones for display to the user. The results show that better performance was achieved in the evaluation metrics of CBAM-YOLOv5, when the input image sizes were 1 280 pixels. Among them, the F1-score was improved up to 0.904, and the average precision reached 0.902 when the intersection over union was set as 0.50. The CBAM-YOLOv5 was better performed than the YOLOv5, YOLOv4, and Faster RCNN, in terms of evaluation metrics, with an average relative error of only 2.56% in the counting. It infers that the improved model was much more stable and faster. Taken together, the CBAM-YOLOv5 presented a greater improvement. The spikes counting system was simple to use and easy to operate. The relative error of count in the field test was only 2.80%, indicating a relatively stable performance. Therefore, the new system can be expected to serve as the rapid and automatic collection of wheat spike counts without manual intervention in the field. The low-cost and reliable system can also provide an accurate data reference for wheat yield prediction.