基于RGB-D相机的脐橙实时识别定位与分级方法

刘德儿; 朱磊; 冀炜臻; 廉悦

doi:10.11975/j.issn.1002-6819.2022.14.018

摘要: 为实现脐橙采摘机器人对脐橙果实进行实时识别、定位和分级采摘的需求，该研究提出了一种基于RGB-D相机数据的脐橙果实实时识别、定位及分级的OrangePointSeg算法。首先利用微软最新消费级深度相机（Azure Kinect DK）采集脐橙果实的RGB-D数据，建立脐橙果实实例分割数据集及增强数据集。然后通过改进YOLACT算法对脐橙果实进行实时分割并生成实例掩膜，与配准后的深度图裁剪得到果实深度点云，再利用最小二乘法进行脐橙果实外形拟合，得到其相机坐标系下质心坐标及半径。试验结果表明，在果实识别阶段，改进YOLACT算法在该数据集上的检测速度为44.63帧/s，平均精度为31.15%。在果实定位阶段，1 400～2 000点云数量时的拟合时间为1.99 ms，定位误差为0.49 cm，拟合出的半径均方根误差为0.43 cm，体积均方根误差为52.6 mL，在大于800点云数量和拍摄距离1 m以内时，定位误差均在0.46 cm以内。最后通过引入并行化计算，OrangePointSeg的总体处理速度为29.4帧/s，能够较好地实现精度与速度的平衡，利于实际应用和工程部署。该研究成果可推广至其他类似形态学特征的果实识别中，为果园的智能化管理提供行之有效的技术支撑。

Abstract: Abstract: The navel orange planting area in Gannan region of China has reached the first in the world, with an annual output of one million tons. However, most navel oranges were picked manually with high labor intensity and low efficiency. It was very urgent to introduce intelligent picking robots. In the process of navel orange picking, the picking robot needed to obtain the spatial position and fruit size information of navel orange fruit in real time to achieve efficient and rapid intelligent grading picking. This study proposed an algorithm framework--OrangePointSge based on the improved YOLACT model and the least-square sphere fitting method. The real-time instance segmentation algorithm YOLACT generated an instance mask and cropped the registered depth image to obtain the fruit depth point cloud. Then, the least-square method was used to fit the navel orange fruit shape, and the centroid coordinates and radius under the camera coordinate system were obtained. Firstly, the RGB-D data of navel orange fruit was collected by Microsoft's latest consumer-level depth camera (Azure Kinect DK), and the navel orange fruit instance segmentation dataset and enhancement dataset were established and free to open source. The dataset contained 2 178 images, including 8 712 samples. Among those samples, the number of non-occluded fruit was 4 682, and the number of slightly occluded fruit was 4 030. Then the YOLACT algorithm was improved to adapt to the detection of navel orange fruit, and the instance mask of the fruit was output. We modified the original ResNet (deep residual network) backbone network to HRNet (High-Resolution Net) to simplify the model, and used the hrnet_w48 structure for feature extraction to improve the model detection accuracy. We also optimized the non-maximum suppression process to improve the detection speed. At the same time, several groups of comparative tests were also set up, including the performance of YOLACT with different backbone networks and the performance of different algorithms in different scenarios. To improve the generalization ability of the training model, we carried out a series of enhancement algorithms for color images, mainly including scale change, color change, brightness change, and adding noise. For long-range small targets, we used oversampling method for enhancement training to improve their detection accuracy. Therefore, the recognition speed and mask accuracy of navel orange were higher than other algorithms. The detection speed was 44.63 frames/s and the average accuracy was 31.15%. The mask generated by improved YOLACT will be cropped with the depth image registered with the color image to obtain the depth information of each fruit to generate the fruit depth point cloud from the depth image. The least-square method was used to fit the point cloud to obtain the centroid coordinates and radius of the fruit, which were used to guide the robot to carry out grading picking. In addition, we added the comparison with RANSAC (Random Sample Consensus) algorithm. The results show that the least-square method is faster than RANSAC while ensuring the appropriate accuracy. When the number of point clouds was 1 400-2 000, the fitting time was 1.99 ms, the positioning error was 0.49 cm, the fitting radius root mean square error was 0.43 cm, and the volume root mean square error was 52.6 mL. Through the experiment of setting different distances on the fitting accuracy, it was found that when the number of point clouds was more than 800 and the distance was less than 1.0 m, the positioning error was controlled within 0.46 cm. Finally, by introducing parallel computing and combining the above two processing flows, the overall processing speed of OrangePointSge was 29.4 frames/s, which can better balance the accuracy and speed. Thus, the proposed algorithm is conducive to practical application and engineering deployment.

基于RGB-D相机的脐橙实时识别定位与分级方法

Real-time identification, localization, and grading method for navel oranges based on RGB-D camera