Abstract:
An accurate detection can greatly contribute to the wheat ears in field environments. Traditional object detection models with horizontal bounding boxes cannot accurately detect the densely distributed wheat ears, particularly on the significant occlusion between ears and stalks. The high miss detection of wheat ears often occurs in the variation of illumination conditions, dense distribution, and small scales, due to the overlap of prediction bounding boxes. It is a high demand to orient the wheat ears with less noise and of large background for the high performance. In this study, an improved Oriented Region-based Convolution Neural Networks (R-CNN) model was proposed to detect and count rotated wheat ears. Firstly, the spatial pyramid pooling cross-stage partial networks (SPPCSPC) was added to the backbone network to generate the last layer of the output feature map. The sensing field was then enlarged to enhance the perceptual ability of the network; Secondly, the feature aggregation network and the efficient two convolutional block attention module (E2CBAM) hybrid attention mechanism module were introduced into the neck network to enrich the feature information in the feature map; Finally, the prediction bounding boxes were optimized using the flexible non-maximal inhibition algorithm soft-non maximum suppression (Soft-NMS), in order to optimize the predicted bounding boxes screening. The E2CBAM module was improved using the convolutional block attention module (CBAM) in the E2CA module, instead of the CAM channel attention module. The E2CA module was composed of two parallel ECA branch structures: the maximum and average pooling. Two adaptive convolution kernels were then obtained to sum. Finally, the channel assignment was weighted for the important channel information. The key feature was captured to improve the detection performance of the model. To verify the E2CBAM hybrid attention module, the path aggregation network (PANet) was introduced into the neck network to enrich the semantic and target location in the feature map. The detection accuracy of the model was then improved by 0.19 percentage points. Furthermore, the detection accuracy was improved by 0.16 and 0.31 percentage points, whereas, the number of parameters increased by 0.24 and 0.20 M. respectively, in the CBAM and E2CBAM hybrid attention mechanism module. The floating-point computation remained unchanged. Compared with the CBAM, the E2CBAM hybrid attention mechanism module improved the detection accuracy of the model by 0.15 percentage points, while reducing the number of parameters by 0.04 M with the unchanged computation. The experimental results show that the improved Oriented R-CNN model accurately represented the head direction of wheat ears, indicating better detection performance. The mean mAP of average accuracy was 2.02 percentage points higher than the original model, compared with the mainstream-oriented bounding boxes detection models. Moreover, the mAP values were improved by 4.99, 2.49, 3.94, 2.25, and 4.12 percentage points, respectively, compared with the mainstream rotating target detection models, Gliding vertex, R3det, Rotated Faster R-CNN, S2anet, and Rotated Retinanet. The Oriented R-CNN was utilized to accurately represent the head direction of wheat ears. The background area was also reduced in the prediction bounding boxes. The model detection was more visually appealing. The finding can provide an effective way for the practical observation of the growth status of wheat ears and counting the number of ears.