基于多重特征增强与特征融合SSD的荔枝检测

    Litchi detection based on multiple feature enhancement and feature fusion SSD

    • 摘要: 使用无人机拍摄的荔枝图像目标尺寸小、特征信息不足。为了更多、更好地检测到荔枝,该研究提出一种基于多重特征增强与特征融合的SSD(Single Shot Multibox Detector based on Multiple Feature Enhancement and Feature Fusion,MFEFF-SSD)模型。为了减少不必要的计算量,删除原始主干网络Vgg16的最后两个卷积层,并在Conv8和Conv9层使用感受野模块(Receptive Field Block,RFB),提升主干网络的特征提取能力;然后使用高效空间金字塔模块(Efficient Spatial Pyramid Block,ESP),增强浅层特征;提出改进的路径聚合网络(Improved Path Aggregation Network,IPANet)多尺度融合特征,提升荔枝小目标的检测效果;最后在浅层引入通道注意力机制SE(Squeeze and Excitation)模块,进一步提高检测精度。同时,调整先验框的大小和数量,适应荔枝小目标的尺寸。试验结果表明:该研究提出的RFB模块可以提高检测效果;IPANet的平均精确率比FPN(Feature Pyramid Network)略有提高;SE模块的平均精确率比CBAM(Convolutional Block Attention Module)、ECA(Efficient Channel Attention)模块分别提高1.15个百分点和2.12个百分点;ESP模块的平均精确率比ASPP(atrous spatial pyramid pooling)提高2.51个百分点;与SSD、Yolov4-tiny、Faster-RCNN和CenterNet模型相比,MFEFF-SSD模型的平均精确率分别提高30.62、14.58、44.46和15.93个百分点,能够更精准、有效地实现对无人机拍摄的荔枝图像检测,可为小目标农作物的检测开拓思路。

       

      Abstract: Abstract: Litchi has been one of the characteristic fruits in Guangdong Province of South China. Traditionally, the litchi can be inspected manually in the orchard, due to its susceptibility to the weather, diseases, and insect pests. The number of fruits in the litchi tree can be counted to determine the subsequent agricultural operations, such as pouring nutrient solution, or the removal of insect pests. Nowadays, an unmanned aerial vehicle (UAV) low-altitude remote sensing has been a promising way to observe the litchi, particularly for safety, efficiency, cost-saving, and easy operation. But the litchi images taken by the UAV are characterized by the smaller target size and fewer features. In this study, an improved Single Shot MultiBox Detector (SSD) model was proposed to detect the small litchi fruits using multiple feature enhancement and feature fusion (MFEFF). Firstly, 160 litchi images with a resolution of 4000×3000 pixels were collected in the orchard using the UAV. A sliding window of 512 pixels was also applied to the original litchi images, according to the target pixels between 20×20 and 30×30 pixels. As such, 3 590 images of 1 024×1 024 pixels were captured to boost the feature information of litchi, such as the size. Secondly, the data enhancement was implemented to improve the robustness of the model using some operations, including the image flip, color space transformation. Thirdly, the last two convolution layers of Vgg16 were deleted to reduce the unnecessary computation. The Receptive Field Block (RFB) was used on the Conv8 and Conv9 layers, and the feature map of Conv3_3 layer was added for the feature extraction, where much more features of the litchi were expanded the receptive field for the detailed information. The Efficient Spatial Pyramid (ESP) Block was also applied on the enhancement of the shallow features in the maps. Finally, the improved Path Aggregation Network (IPANet) was used for the multi-scale features fusion at the Conv3_3, Conv4_3, and fc7 layers. A Squeeze and Extension (SE) module was also introduced in the first two feature layers, further to improve the detection accuracy. The channel attention network was also used the global information of litchi images to selectively enhance the weight of the litchi channel, but to efficiently suppress the useless feature information, such as green leaves against the background. At the same time, the size and quantity of anchors were adjusted to match the size of the small target litchi. A training detection was carried out to verify the model for the total of 3 590 labeled images, where 3 to 114 litchi targets were set in each image. Among them, 2 907 images were distributed as the training set, 324 as the validation set, and 359 as the testing set. The precision indexes were the mean Average Precision (mAP), recall, precision, and F1-Score (F1). The results showed that the RFB module was significantly improved the detection, compared with the original, where the mAP, Recall (R), Precision (P), and F1 increased by 2.61, 0.48, 1.49, and 1 percentage points, respectively. The IPANet detection was better than that of the feature pyramid network (FPN), where the mAP value increased by 0.44 percentage points. The SE module was better than the Convolutional Block Attention Module (CBAM) and Efficient Channel Attention (ECA), indicating the best score of the three modules. The ESP detection was superior to the atrous spatial pyramid pooling (ASPP), in which the mAP, R, and P increased by 2.51, 0.38, and 1.13 percentage points, respectively. Consequently, the MFEFF-SSD had improved the mean average precision by 30.62, 14.58, 44.46, and 15.93 percentage points, respectively, compared with the SSD, Yolov4-tiny, Faster-RCNN, and CenterNet models. Anyway, the MFEFF-SSD model can be widely expected to more accurately and effectively detect the litchi images taken by UAV. This finding can also provide a strong reference for the detection of small target fruits.

       

    /

    返回文章
    返回