基于改进BlendMask模型的苹果识别与定位方法

白晓平; 蔡皓月; 王卓; 张懿文

doi:10.11975/j.issn.1002-6819.202311200

摘要: 针对实际自然环境中果实被遮挡、环境光线变化等干扰因素以及传统视觉方法难以准确分割出农作物轮廓等问题，该研究以苹果为试验对象，提出一种基于改进BlendMask模型的实例分割与定位方法。该研究通过引入高分辨率网络HRNet，缓解了特征图在深层网络中分辨率下降的问题，同时，在融合掩码层中引入卷积注意力机制CBAM（convolutional block attention module），提高了实例掩码的质量，进而提升实例分割质量。该研究设计了一个高效抽取实例表面点云的算法，将实例掩码与深度图匹配以获取苹果目标实例的三维表面点云，并通过均匀下采样与统计滤波算法去除点云中的切向与离群噪声，再运用球体方程线性化形式的最小二乘法估计苹果在三维空间中的中心坐标，实现了苹果的中心定位。试验结果表明改进BlendMask的平均分割精度为96.65%，检测速度34.51帧/s，相较于原始BlendMask模型，准确率、召回率与平均精度分别提升5.48、1.25与6.59个百分点；相较于分割模型SparseInst、FastInst与PatchDCT，该模型在平均精度变化不大的情况下，检测速度分别提升6.11、3.84与20.08帧/s，该研究为苹果采摘机器人的视觉系统提供技术参考。

Abstract: Manual picking cannot fully meet the large-scale apple harvesting at present, due to the high labor intensity and cost with low efficiency. Fruit picking robot has drawn much attention in recent years, in order to realize the automatic picking and yield estimation. Among them, the vision system can dominate the efficiency and stability of the picking robot. It is required for the high speed and accuracy of fruit recognition under complex natural environments. Therefore, the vision system can be expected to accurately recognize the fruits on the tree. This study aims to identify and locate the apples in natural environments, particularly on the interference factors, such as blocked fruits, ambient light, viewing angle distance. Apple was also taken as the test object. Traditional vision was improved to accurately segment the contour of target fruit. Instance segmentation and localization were proposed using the improved BlendMask model. The original backbone network was replaced with the high-resolution network, HRNet (High-Resolution Net), in order to alleviate the decreasing resolution of feature maps in deep networks. Convolutional block attention mechanism (CBAM) was also introduced in the fusion mask layer of the instance segmentation model. The instance mask was thus improved the instance segmentation. Ablation experiments were carried out to verify a variety of popular instance segmentation backbone networks. HRNet was selected as the backbone network. BlendMask model was used to achieve the better performance balance between real-time and segmentation accuracy. At the same time, fruit recognition and localization were implemented to consider the recognition accuracy in real time. Therefore, the improved model was suitable for the fruit target recognition and localization. Instance segmentation was designed to efficiently extract the surface point cloud of the instance. The instance mask was matched with the depth map, in order to obtain the 3D surface point cloud of apple target instance. The tangential and outlier noises were removed in the point cloud using uniform downsampling and statistical filtering. Then the center coordinates of the apples were estimated in the 3D space using the least-squares method (LSM). The center of target localization was achieved in the form of linearization of spherical equations. Other geometric indicators were also be used in the localization framework to realize the center localization of different kinds of fruits. The experimental results show that the average segmentation accuracy of the improved BlendMask model was 96.65%, and the detection speed reached 34.51 frames/s. The average segmentation accuracy of improved BlendMask model was 96.65%. The accuracy, recall and average accuracy were improved by 5.48%, 1.25% and 6.59%, respectively; Compared with the current new instance segmentation models, SparseInst, FastInst and PatchDCT, the average accuracy of the model was slightly lagged behind by 0.29%, 0.04% and 1.94%, respectively, whereas, the detection speed was ahead by 6.11%, 3% and 3%, respectively. The detection speed was 6.11, 3.84 and 20.08 frames/s ahead, respectively. The improved BlendMask model shared the high segmentation accuracy in real time. The finding can provide a technical solution to the vision system in apple-picking robots.

基于改进BlendMask模型的苹果识别与定位方法

Recognizing and locating apple using improved BlendMask model