基于改进YOLOv8的草莓识别与果梗采摘关键点检测

杨震宇; 汪小旵; 祁子涵; 王得志

doi:10.11975/j.issn.1002-6819.202405044

摘要: 为解决草莓采摘机器人工作过程中果梗采摘点定位精度低和遮挡草莓识别困难等问题，该研究提出一种改进后YOLOv8算法与pose关键点检测算法相结合的草莓识别定位方法。通过对YOLOv8进行优化，引入BiFPN（bidirectional feature pyramid network）和GAM（generalized attention module）模块以强化模型的双向信息流，动态分配特征权重，并专注于小目标特征的提取和强化被遮挡区域特征，旨在提升模型复杂环境中采摘点定位准确率和遮挡识别的预测准确性。试验结果显示，相比于原始模型，经过改进的YOLOv8-pose模型在草莓识别准确率（precision，P）、召回率（recall，R）、平均精度（mean average precision，mAP）及关键点平均精度（mean average precision - key point，mAP_kp）上分别提高6.01、1.98、6.67和7.85个百分点，基于关键点检测的果梗采摘点在X、Y、Z方向定位误差分别为1.4、1.4和2.2 mm。此外，根据草莓遮挡重叠区域面积对草莓遮挡程度分类，利用不同遮挡程度对模型性能验证，在遮挡情况下改进后YOLOv8-pose，mAP_kp比原模型提高9.78个百分点。基于该研究所提出的视觉模型，机器人在田间试验下的采摘成功率为95%，单个草莓的采摘耗时10 s，可为实现机器人精准采摘提供重要的技术支持。

Abstract: Robotic harvesting had been constrained by the low positioning accuracy of strawberry stem picking points and the significant challenge of identifying occluded strawberries. In this study, we proposed an improved YOLOv8 model combined with Pose key-point detection for enhanced strawberry recognition and localization. The accuracy of picking point localization was also improved, especially for occluded strawberries in complex environments. To optimize the YOLOv8 model, we introduced the Bidirectional Feature Pyramid Network (BiFPN) and the Generalized Attention Module (GAM), which enhanced bidirectional information flow, dynamically allocated feature weights, and focused on extracting features of small targets and enhancing the features of occluded regions. As a result, the model's ability to accurately detect and localize strawberries in complex environments was significantly improved.Experimental results showed that the improved YOLOv8-pose model outperformed the original model in several metrics: the Precision (P) increased by 6.01 percentage points, Recall (R) by 1.98 percentage points, mean Average Precision (mAP) by 6.67 percentage points, and mean Average Precision for key points (mAP_kp) by 7.85 percentage points. The positioning accuracy for strawberry stem picking points, based on key-point detection, achieved errors of just 1.4 mm in both the x and y directions and 2.2 mm in the z direction. Additionally, the occlusion level was classified according to the overlap area of occluded strawberries, and the model's performance under varying occlusion conditions was assessed. Under these conditions, the mAP_kp of the improved YOLOv8-pose model increased by 9.78 percentage points compared to the original model. Field trials further validated the model's effectiveness, with the strawberry-picking robot achieving a 95% success rate, picking each strawberry within 10 seconds. The high success rate and short picking time demonstrated the practicality of the model in real-world agricultural settings, indicating its high efficiency and accuracy. The improved YOLOv8 model with key-point detection accurately and robustly recognized strawberries, leveraged multi-scale features with the BiFPN architecture, and focused attention on relevant regions with the GAM, especially for occluded strawberries. These advancements significantly improved overall performance in precision, recall, and average precision, particularly under occlusion scenarios.In conclusion, these advanced techniques were integrated into a more capable strawberry-picking robot system. The enhanced accuracy and efficiency achieved in recognizing and localizing strawberries, even in challenging occlusion scenarios, highlighted the system's potential for practical agricultural applications. The findings contributed significantly to automated strawberry harvesting in agricultural robotics, paving the way for more efficient and cost-effective farming solutions in sustainable production.

基于改进YOLOv8的草莓识别与果梗采摘关键点检测

Recognizing strawberry to detect the key points for peduncle picking using improved YOLOv8 model