采用改进YOLOv5的蕉穗识别及其底部果轴定位

段洁利; 王昭锐; 邹湘军; 袁浩天; 黄广生; 杨洲

doi:10.11975/j.issn.1002-6819.2022.19.014

采用改进YOLOv5的蕉穗识别及其底部果轴定位

Recognition of bananas to locate bottom fruit axis using improved YOLOv5

摘要

摘要: 为提高香蕉采摘机器人的作业效率和质量，实现机器人末端承接机构的精确定位，该研究提出一种基于YOLOv5算法的蕉穗识别，并对蕉穗底部果轴进行定位的方法。将CA（Coordinate Attention）注意力机制融合到主干网络中，同时将C3（Concentrated-Comprehensive Convolution Block）特征提取模块与CA注意力机制模块融合构成C3CA模块，以此增强蕉穗特征信息的提取。用 EIoU（Efficient Intersection over Union）损失对原损失函数CIoU（Complete Intersection over Union）进行替换，加快模型收敛并降低损失值。通过改进预测目标框回归公式获取试验所需定位点，并对该点的相机坐标系进行转换求解出三维坐标。采用D435i深度相机对蕉穗底部果轴进行定位试验。识别试验表明，与YOLOv5、Faster R-CNN等模型相比，改进YOLOv5模型的平均精度值（mean Average Precision, mAP）分别提升了0.17和21.26个百分点；定位试验表明，采用改进YOLOv5模型对蕉穗底部果轴定位误差均值和误差比均值分别为0.063 m和2.992%，与YOLOv5和Faster R-CNN模型相比，定位误差均值和误差比均值分别降低了0.022 m和1.173个百分点，0.105 m和5.054个百分点。试验实时可视化结果表明，改进模型能对果园环境下蕉穗进行快速识别和定位，保证作业质量，为后续水果采摘机器人的研究奠定了基础。

Abstract: Banana has been one of the major fruits in the production and consumption in China. But, the banana harvesting is a high labor consuming activity with the low efficiency and large fruit damage. This study aims to improve the operation efficiency and quality of the banana in the picking robot. An accurate and rapid recognition was also proposed to locate the fruit axis at the bottom of banana using the YOLOv5 algorithm. Specifically, a coordinate attention (CA) mechanism was fused into the backbone network. The Concentrated-Comprehensive Convolution Block (C3) feature extraction module was fused with the CA attention mechanism module to form the C3CA module, in order to enhance the extraction of the banana feature information. The original Complete Intersection over Union (CIoU) of loss function was replaced with the Efficient Intersection over Union (EIoU). As such, the convergence of the model was speeded up to reduce the loss value. After that, the anchor point was determined for the test to improve the regression formula of prediction target box. The camera coordinate system of the point was transformed to deal with the three-dimensional coordinates. D435i depth camera was then used to locate the fruit axis at the bottom of banana. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model were trained to verify the model. The accuracy of the improved model increased by 2.8 percentage points, the recall rate reached 100%, and the average accuracy value increased by 0.17 percentage points, compared with the original. There were the 52.96 percentage points higher precision, 17.91 percentage points higher recall, and 21.26 percentage points higher average precision value, compared with the Faster R-CNN model. The size of the improved model was reduced by 1.06MB, compared with the original. The field test was conducted on July 1, 2022 in Dongguan Fruit and Vegetable Research Institute, Guangdong Province, China. A test was realized for the random real-time location of the fruit axis at the bottom of banana in the field environment. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model were used to recognize and localize the single and double plants in the range of 1.0-2.5 m. Each model was tested for 10 times. The estimated and real values were recorded to calculate the mean error, the mean error ratio, and the mean value. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model all performed better to identify the banana in the field of view within the localization range and the estimated values. Among them, the mean errors were 0.085, 0.168, and 0.063 m, respectively, while the mean error ratios were 4.165%, 8.046%, and 2.992%, respectively. The mean values of error and error ratio in the improved model were reduced by 0.105 m, and 5.054 percentage points, respectively, during the original training, compared with the Faster R-CNN model. By contrast, the error and error ratio of the improved YOLOv5 model were reduced by 0.022 m and 1.173 percentage points, respectively, compared with the original. In addition, the measurement error greater than 0.2 m in the test was a locating error. Only test 6 showed the locating errors with the low error rate in the improved YOLOv5 model. The locating errors were found in tests 3 and 4 of the original, while the Faster R-CNN model showed the localization errors in the tests of 1, 4 and 8. Together with the ideal localization, the lower error and higher dimensional accuracy, the improved YOLOv5 model was conducive to the migration application and rapid recognition of bananas in the complex environments. In this case, the vision module of banana picking robot can meet the requirements for the axial locating of the undertaking mechanism at the bottom of banana fruit in the field environment.

HTML全文

参考文献(34)

施引文献

资源附件(0)