基于目标区域语义分割的苹果采摘点定位方法

王文波; 单韵竹; 胡甜甜; 顾寄南; 朱永民; 高艳

doi:10.11975/j.issn.1002-6819.202407195

摘要: 为提高苹果采摘点的定位精度以及减少算法运算时间，提出一种基于目标区域分割的苹果采摘点高效定位方法。首先，对目标检测框进行裁剪以获取苹果目标区域，采用LabelMe标注工具构建苹果目标区域语义分割数据集。其次，提出苹果目标区域语义分割算法MobileViT-Seg，构建轻量级编码器与分层池化解码器，在保持较小模型尺寸和低计算成本的同时保持了对全局上下文信息的有效提取。针对枝叶遮挡和果实重叠导致苹果区域不完整的问题，对分割获取的目标掩膜区域进行圆拟合，并以圆心作为采摘点的位置，并融合RGB-D信息实现对采摘点空间位置的定位。最后，对提出算法进行对比试验，试验结果显示，所提出的模型实现了一定的轻量化，参数量和计算量分别为：7.18 M、29.93 G，平均交并比（mean intersection over union，mIoU）、平均像素准确率（mean pixel accuracy, mPA）、整体准确率（accuracy，ac）分别达到89.79%、94.46%、94.73%，检测速度达到了100.06帧/s，采摘点定位方法的平均准确率达到了90.80%。研究成果为自动化苹果采摘技术的发展提供了有效的技术支持。

Abstract: Apples are the popular fruits rich in vitamins and fiber to protect the health of the body in daily life. It is highly required for the automatic apple picking, due to a large amount of human input during traditional picking. Automate apple picking can also rely on the accurate location of ripe apples. Among them, manual labelling or rule-based computer vision has been used to identify the fruit locations. But the traditional counting and positioning cannot fully meet the requirements of complex and varying orchard environments and different seasons, particularly on the intertwining of branches and leaves of fruit trees, the shading of fruits, and the light conditions. Therefore, it is very necessary for the accurate and efficient algorithms in the point localization during apple picking. In this study, an efficient positioning was proposed for apple picking points using target area segmentation. The specific procedures were as follows. Firstly, the target detection frame was acquired to crop the apple target region. LabelMe annotation tool was used to manually annotate the outer contour of the target point by point. The semantic segmentation dataset was constructed for the apple target region. A total of 1 503 images were obtained after operations, of which 1 352 were used for training and the remaining 151 were used for validation. Secondly, MobileViT-Seg, an apple target region semantic segmentation, was proposed to construct a lightweight encoder and a hierarchical pooling decoder. The encoder was adopted the pre-trained MobileViT structure, which was down sampled the input images step by step to extract the high-level feature information. The decoder was used the PPM (pyramid pooling module) module and Softmax processing to gradually recover the spatial resolution of the image for the accurate segmentation. Effective extraction of global contextual information was maintained on a small model size and low computational cost. Finally, there was the incomplete apple region, due to the branch and leaf occlusion and fruit overlap. The target-mask region after segmentation was fitted with a circle using the least squares shape fitting. The center of the circle was used as the location of picking point. The RGB-D information was then fused to achieve the localization of the spatial location of picking point. The experimental results show that the MobileViT-Seg model shared the high robustness to locate the picking point in the multiple scenes. MobileViT-Seg was performed the best with the low computational cost, compared with the several mainstream segmentation methods, Unet, PSPnet, Mobilenetv3_deeplabv3+, and Deeplabv3+. While the number of parameters and FLOPs are 7.18 M and 29.93 G, the mean intersection over union (mIoU) reached 89.79%, the mean pixel accuracy (mPA)reached 94.46%, the Accuracy(Ac) reached 94.73% and the detection speed reached 100.06 frames per second. The average accuracy of the picking point localization reached 90.80% on 200 raw apple images that captured by the camera in real time. The positioning accuracy was fully met the requirements. In summary, an efficient technical solution was provided for the automated apple picking. An advanced segmentation was also implemented to combine with the precise spatial localization, indicating the accurate picking point localization in complex orchard conditions. The improved model can lay the foundation for the picking point localization of apple picking robots.

基于目标区域语义分割的苹果采摘点定位方法

Locating apple picking points using semantic segmentation of target region