基于改进YOLOv5s的果园环境葡萄检测

孙俊; 吴兆祺; 贾忆琳; 宫东见; 武小红; 沈继锋

doi:10.11975/j.issn.1002-6819.202306135

摘要: 为了快速精准地识别复杂果园环境下的葡萄目标，该研究基于YOLOv5s提出一种改进的葡萄检测模型（MRW-YOLOv5s）。首先，为了减少模型参数量，采用轻量型网络MobileNetv3作为特征提取网络，并在MobileNetv3的bneck结构中嵌入坐标注意力模块（coordinate attention，CA）以加强网络的特征提取能力；其次，在颈部网络中引入RepVGG Block，融合多分支特征提升模型的检测精度，并利用RepVGG Block的结构重参数化进一步加快模型的推理速度；最后，采用基于动态非单调聚焦机制的损失（wise intersection over union loss，WIoU Loss）作为边界框回归损失函数，加速网络收敛并提高模型的检测准确率。结果表明，改进的MRW-YOLOv5s模型参数量仅为7.56 M，在测试集上的平均精度均值（mean average precision，mAP）达到97.74%，相较于原YOLOv5s模型提升了2.32个百分点，平均每幅图片的检测时间为10.03 ms，比原YOLOv5s模型减少了6.13 ms。与主流的目标检测模型SSD、RetinaNet、YOLOv4、YOLOv7和YOLOX相比，MRW-YOLOv5s模型的mAP分别高出9.89、7.53、2.12、0.91、2.42个百分点，并且在模型参数量大小和检测速度方面有着很大的优势，该研究可为果园智能化、采摘机械化提供技术支持。

Abstract: Grape has been one of the most popular fruits with great nutritional value and economic benefits. Manual picking of mature grapes cannot fully meet the large-scale production in recent years, particularly with the expansion of planting areas. A picking robot can be expected to monitor the growth of grapes in orchards in real time. Automatic grape picking can also be promoted to realize intelligent agricultural production. In this study, an improved YOLOv5s model (MRW-YOLOv5s) was proposed to rapidly and accurately identify the grapes in orchards. Firstly, the lightweight network MobileNetv3 was used as the feature extraction network, in order to reduce the amount of model parameters. A coordinate attention module (CA) was also embedded into the bneck structure of MobileNetv3 to strengthen the feature extraction capability of the network. Secondly, RepVGG Block was introduced into the neck network, where the multi-branch features were integrated to improve the detection accuracy of the model. Moreover, the structural reparameterization of the RepVGG Block was implemented to further accelerate the inference speed of the model. Finally, Wise Intersection over Union Loss (WIoU Loss) with the dynamic non-monotonic focusing mechanism was taken as the bounding box regression loss function, in order to accelerate the network convergence for the better detection accuracy of the model. Gradient-weighted class activation mapping (Grad-CAM) was also selected to capture the grape targets when the backbone network of the improved model was embedded with the CA module. A better performance was then achieved, compared with the model embedded with Efficient Channel Attention (ECA) and Convolutional Block Attention Module (CBAM). In addition, there was the lowest speed of bounding box loss regression in the convergence curve of the loss function, while the highest loss value after convergence, where the EIoU was the bounding box loss function. Once the CIoU and Wise-IoU v1 were taken as the bounding box loss functions, there were similar convergence speeds and loss values, indicating a slightly lower value than that of EIoU. Moreover, there was the highest convergence speed of the model and the lowest loss value, when the Wise-IoU v3 was used as the bounding box loss function. Therefore, the Wise-IoU v3 can be expected to accelerate the convergence of the model for better accuracy of the model. The results showed that the number of parameters of the improved MRW-YOLOv5s model was only 7.56 M. The mean Average Precision (mAP) on the test set reached 97.74%, and the average detection time per image was 10.03 ms, which were 2.32 percentage points and 6ms higher than those of the original YOLOv5s model, respectively. The mAP of the MRW-YOLOv5s model was 9.89, 7.53, 2.12, 0.91, and 2.42 percentage points higher, respectively, compared with the mainstream object detection models, such as the SSD, RetinaNet, YOLOv4, YOLOv7, and YOLOX. In terms of the number of model parameters, the improved model was only 7.56 M, which was 68.2%, 79.2%, 88.2%, 79.7%, and 15.4% less than the above five models, respectively. The average detection speed of the improved model was only 10.03 ms, which was 2.64, 13.19, 10.59, 4.14, and 5.46 ms higher than the above five models, respectively. Furthermore, the weight size of the improved model was only 26.97 MB, which was more conducive to model deployment. Therefore, the MRW-YOLOv5s model can greatly contribute to the detection accuracy, parameter size, and detection speed. The finding can also provide technical support for the intelligent orchards and mechanization of picking.

基于改进YOLOv5s的果园环境葡萄检测

Detecting grape in an orchard using improved YOLOv5s