基于多目标识别的葡萄果串采摘点定位方法

周馨曌; 吴烽云; 邹湘军; 蒙贺伟; 张芸齐; 罗锡文

doi:10.11975/j.issn.1002-6819.202309105

摘要: 为减少采摘点定位不当导致末端碰撞损伤结果枝与果串，致使采摘失败及损伤率提高等问题，该研究提出了基于深度学习与葡萄关键结构多目标识别的采摘点定位方法。首先，通过改进YOLACT++模型对结果枝、果梗、果串等葡萄关键结构进行识别与分割；结合关键区域间的相交情况、相对位置，构建同串葡萄关键结构从属判断与合并方法。最后设计了基于结构约束与范围再选的果梗低碰撞感兴趣区域(region of interest, ROI)选择方法，并以该区域果梗质心为采摘点。试验结果表明，相比于原始的YOLACT++，G-YOLACT++边界框和掩膜平均精度均值分别提升了0.83与0.88个百分点；对单串果实、多串果实样本关键结构从属判断与合并的正确率分别为88%、90%，对关键结构不完整的果串剔除正确率为92.3%；相较于以ROI中果梗外接矩形的中心、以模型识别果梗的质心作为采摘点的定位方法，该研究采摘点定位方法的成功率分别提升了10.95、81.75个百分点。该研究为葡萄采摘机器人的优化提供了技术支持，为非结构化环境中的串类果实采摘机器人的低损收获奠定基础。

Abstract: Grape-picking robots can be an effective solution to deal with the contradiction between manual labor efficiency and the limited harvesting period, with the rapid development of machine vision and artificial intelligence. The varying sizes and shapes of grape key structures have limited the working space of the robot at the grape harvesting stage. Improper positioning of picking points can also lead to collisions between the robot's end and the grapes, even the damage and dropping. In addition, it is necessary to consider such collisions between the robot's end and the fruit branches. The reason is that these collisions can result in failed picking, damage to the branches, and the risk of fungal infection in the fruit trees. In this study, the localization algorithm was proposed for the picking points of grape key structures using deep learning and multi-object recognition. Picking point localization was enhanced to reduce the grape damage and the failure rate during harvesting. Firstly, the G-YOLACT++ model incorporated the SimAM attention module and Mish activation function to optimize the YOLACT++ model. Then the key grape structures were detected, such as grape-bearing branches, grape peduncles, and grape clusters. As such, these grape structures in the multi-adjacent clusters were segmented into multiple masks within the field of view. The membership of grape key structures was determined within the same cluster using their intersection and relative positions. The same string of grapes was then merged to select the Region of Interest (ROI) area with the low collision for grape pedicles. The range of re-selection was also designed to locate the picking point. The experimental results demonstrated that the incorporation of the SimAM attention mechanism into the YOLACT++ model resulted in an improved mean average precision (mAP) for the mask. The Mish activation function was selected to replace the ReLU in the backbone network. After that, the mAP values of the mask and bounding box increased by 0.3 and 2.23 percentage points, respectively. Both modifications were greatly contributed to the enhancement of the performance. The average mAP values of the bounding box and mask in G-YOLACT++ were improved by 0.83 and 0.88 percentage points, respectively, compared with the YOLACT++. By contrast, the mAP values of the improved model for the bounding box and mask increased by 2.36 and 2.13 percentage points, respectively, compared with the original. Furthermore, the sizes of all the improved models remained unchanged, while there was a relatively slight improvement in the inference speed. Therefore, there was a positive effect of improvement on the performance of the models. The correctness rates of the single and multiple fruit samples were 88% and 90%, respectively, for the key structure-dependent judgment and fusion. The correctness rate was 92.3% for the removal of grape clusters with the incomplete recognition of key structures. Compared to the two positioning methods that use the center of the bounding rectangle enclosing the grape peduncles in ROI and the centroid of the grape peduncles identified by the model as the picking points, the success rates of the picking point localization method in this study were improved by 10.95 and 81.75 percentage points, respectively. These results demonstrated the research could be a viable support to the optimization of grape picking robots and lays the foundation for low-damage harvesting of clustered fruits in unstructured environments.

基于多目标识别的葡萄果串采摘点定位方法

Method for locating picking points of grape clusters using multi-object recognition