基于改进Oriented R-CNN的旋转框麦穗检测与计数模型

    Improved Oriented R-CNN-based model for oriented wheat ears detection and counting

    • 摘要: 为对干扰、遮挡等复杂的田野环境中麦穗进行精准定位与计数,该研究提出了一种改进的Oriented R-CNN麦穗旋转框检测与计数方法,首先在主干网络中引入跨阶段局部空间金字塔(spatial pyramid pooling cross stage partial networks,SPPCSPC)模块扩大模型感受野,增强网络感知能力;其次,在颈网络中结合路径聚合网络(PANet,path aggregation network)和混合注意力机制(E2CBAM,efficient two convolutional block attention module),丰富特征图包含的特征信息;最后采用柔性非极大值抑制算法(Soft-NMS,soft-non maximum suppression)优化预测框筛选过程。试验结果显示,改进的模型对复杂环境中的麦穗检测效果良好。相较原模型,平均精确度均值mAP提高了2.02个百分点,与主流的旋转目标检测模型Gliding vertex、R3det、Rotated Faster R-CNN、S2anet和Rotated Retinanet相比,mAP分别提高了4.99、2.49、3.94、2.25和4.12个百分点。该研究方法利用旋转框准确定位麦穗位置,使得框内背景区域面积大幅度减少,为实际观察麦穗生长状况和统计数量提供了一种有效的方法。

       

      Abstract: An accurate detection can greatly contribute to the wheat ears in field environments. Traditional object detection models with horizontal bounding boxes cannot accurately detect the densely distributed wheat ears, particularly on the significant occlusion between ears and stalks. The high miss detection of wheat ears often occurs in the variation of illumination conditions, dense distribution, and small scales, due to the overlap of prediction bounding boxes. It is a high demand to orient the wheat ears with less noise and of large background for the high performance. In this study, an improved Oriented Region-based Convolution Neural Networks (R-CNN) model was proposed to detect and count rotated wheat ears. Firstly, the spatial pyramid pooling cross-stage partial networks (SPPCSPC) was added to the backbone network to generate the last layer of the output feature map. The sensing field was then enlarged to enhance the perceptual ability of the network; Secondly, the feature aggregation network and the efficient two convolutional block attention module (E2CBAM) hybrid attention mechanism module were introduced into the neck network to enrich the feature information in the feature map; Finally, the prediction bounding boxes were optimized using the flexible non-maximal inhibition algorithm soft-non maximum suppression (Soft-NMS), in order to optimize the predicted bounding boxes screening. The E2CBAM module was improved using the convolutional block attention module (CBAM) in the E2CA module, instead of the CAM channel attention module. The E2CA module was composed of two parallel ECA branch structures: the maximum and average pooling. Two adaptive convolution kernels were then obtained to sum. Finally, the channel assignment was weighted for the important channel information. The key feature was captured to improve the detection performance of the model. To verify the E2CBAM hybrid attention module, the path aggregation network (PANet) was introduced into the neck network to enrich the semantic and target location in the feature map. The detection accuracy of the model was then improved by 0.19 percentage points. Furthermore, the detection accuracy was improved by 0.16 and 0.31 percentage points, whereas, the number of parameters increased by 0.24 and 0.20 M. respectively, in the CBAM and E2CBAM hybrid attention mechanism module. The floating-point computation remained unchanged. Compared with the CBAM, the E2CBAM hybrid attention mechanism module improved the detection accuracy of the model by 0.15 percentage points, while reducing the number of parameters by 0.04 M with the unchanged computation. The experimental results show that the improved Oriented R-CNN model accurately represented the head direction of wheat ears, indicating better detection performance. The mean mAP of average accuracy was 2.02 percentage points higher than the original model, compared with the mainstream-oriented bounding boxes detection models. Moreover, the mAP values were improved by 4.99, 2.49, 3.94, 2.25, and 4.12 percentage points, respectively, compared with the mainstream rotating target detection models, Gliding vertex, R3det, Rotated Faster R-CNN, S2anet, and Rotated Retinanet. The Oriented R-CNN was utilized to accurately represent the head direction of wheat ears. The background area was also reduced in the prediction bounding boxes. The model detection was more visually appealing. The finding can provide an effective way for the practical observation of the growth status of wheat ears and counting the number of ears.

       

    /

    返回文章
    返回