Abstract:
Optical sensors have been widely installed on unmanned aerial vehicle (UAV) to capture images of all kinds of crops in recent years. The economic and effective way can greatly contribute to yield prediction and field management in modern agriculture. However, the great challenge of wheat ear counting still remains in the dense distribution of wheat ears, the serious overlap phenomenon, and the complex background information in the images. In this study, a detection model of the wheat ear was designed to improve the accuracy of the wheat ear counting in the UAV images using the transformer prediction heads "you only look once" (TPH-YOLO). The UAV wheat ear images were also taken as the research object. Firstly, the Retinex algorithm was used to deal with the enhancement of the wheat ear images that collected by the UAV, in order to reduce the influence of the uneven illumination on the image quality. Secondly, the coordinate attention mechanism (CA) was added to the backbone network of YOLOv5. In this way, the improved model was utilized to refine the features after treatment. As a result, the TPH-YOLO network was focused mainly on the wheat ear information, at the same time to avoid the interference of some background factors, such as the wheat stalk, and the wheat leaf. Once more, the original prediction head in the YOLOv5 was converted into the transformer prediction head (TPH) in this case. Correspondingly, the improved prediction head was obtained for the prediction potential of multiple head attention mechanism, in order to accurately fix the position of the wheat ears in a high-density scene. In the end, the training strategy was adopted to improve the generalization ability and the detection accuracy of the TPH-YOLO network using transfer learning. The image dataset of the wheat ear that was collected in the field was used to pre-train the model, and then the wheat ear image dataset collected by the UAV was used to update and optimize the model parameters. A series of experiments were conducted on the wheat ear images collected by the UAV. The performance of the target detection model was evaluated by the three indicators: Precision, recall, and average precision (AP). The experimental results show that the precision, recall, and average precision (AP) of the improved model were 87.2%, 84.1%, and 88.8%, respectively. The average precision of the improved model was 4.1% higher than the original YOLOv5 one. The performance was also better than the SSD, Fast RCNN, CenterNet, and Yolov5 target detection models. In addition, Global Wheat Head Detection (GWHD) dataset was selected to carry out the comparative experiments on the different target detection models, due to the diverse and typical wheat samples from the GWHD dataset. Compared with the target detection models such as SSD, Faster-RCNN, CenterNet and YOLOv5, the average precision increased by 11.1, 5.4, 6.9 and 3.3 percentage points respectively. The comparative analysis of the detection further verified the reliability and effectiveness of the improved model. Consequently, the finding can also provide strong support for the wheat yield prediction.