基于级联视觉检测的樱桃番茄自动采收系统设计与试验

李兴旭; 陈雯柏; 王一群; 杨顺; 吴华瑞; 赵春江

doi:10.11975/j.issn.1002-6819.202210099

基于级联视觉检测的樱桃番茄自动采收系统设计与试验

Design and experiment of an automatic cherry tomato harvesting system based on cascade vision detection

摘要

摘要: 樱桃番茄串生长姿态多样、果实成熟度不一，采摘机器人进行“粒收”作业时，常面临果梗干涉末端执行器、成熟度判断错误等问题，导致采摘效率低下、难以有效实现分级采收。针对上述问题，该研究提出一种级联视觉检测流程，包括采收目标检测、目标果实特性判别、果实与果梗位置关系判断3个关键环节。首先根据农艺要求按成熟度将番茄果实分为4个等级，引入YOLOv5目标检测模型对番茄串和番茄果实进行检测并输出成熟度等级，实现分期采收。然后对果实与果梗的相对位置进行判断，利用MobileNetv3网络模型对膨胀包围盒进行果实与果梗相对位置关系判断，实现末端执行器采摘位姿控制。日光温室实际测试结果表明，本文提出的级联检测系统平均推理用时22 ms，在IOU（intersection over union）阈值为0.5的情况下，樱桃番茄串与果实的平均检测精度达到89.9%，满足采摘机器人的视觉检测精度和实时性要求，相比末端执行器以固定角度靠近待采目标的方法，本文方法采收效率提升28.7个百分点。研究结果可为各类果蔬采摘机器人研究提供参考。

Abstract: Cherry tomatoes are a small variety of tomatoes with a shape size of not large than 2.5 cm and mostly grow in bunches. Furthermore, the bunches of cherry tomatoes also grow in variable postures. These growth conditions have posed a great challenge to the harvesting robot at a fixed angle. Once the robots automatically perform single-fruit harvesting operations, the stems can be found to usually interfere with the end-effectors, resulting in low picking efficiency. The reason may be that the picking robots cannot move towards commercialization. Particularly, not all fruits in a tomato bunch grow and ripen simultaneously. It is very necessary to pick the ripe fruits on time, in order to ensure a fresh taste with high economic profits. Therefore, a robotic vision system is highly required to rapidly and accurately identify fruit ripeness. In this study, a cascaded vision detection approach was proposed to harvest the single tomatoes from the robotic spikes. The processing procedure included three key aspects: the detection of the harvesting target, the determination of target maturity, and the fruit-stalk position relationship. Firstly, the YOLOv5 model of target detection was introduced to detect the tomato fruits and bunches. The tomato fruits were labelled into four categories using agronomic growing and harvesting requirements, including green, turning, ripe, and fully ripe fruit. It was totally difference from the simply classified ripeness than before. Among them, the ripe, and fully ripe fruit were targeted for robotic harvesting. The overlap of visual features was then fully considered for ripeness determination and target detection. The original YOLOv5 was improved for ripeness detection using multi-task learning. The robot was confined to only picking the tomatoes on both sides of the culture rack, due mainly to the structure of the greenhouse facility. The target detection was then filtered out for the targets beyond the execution range of the robot. The distance was also set as 1.55 m between the culture racks in this case. The region of interest (ROI) of the target fruit was then approximated as an ellipsoid with an equatorial diameter and a polar diameter of approximately 2.5 cm. The pinhole camera model was used to calculate the ROI picking range. Specifically, the tomatoes growing on the incubator outside the working range of the robot were mostly smaller than the 10 pixel×10 pixel region in the 640 pixel×640 pixel RGB image. At the same time, a large number of feature layers were cropped to choose the unlabeled targets in the annotation stage. As such, better performance was achieved to reduce the labor cost, particularly when filtering the targets without being captured. This end-to-end approach was required without post-processing. It was much more adaptable to real scenarios, compared with the traditional approach of filtering targets by the threshold setting. The field experiments show that the fruit stalk interfering with the end-effector was a major cause of robot picking failure or low efficiency. Correspondingly, the optimal angle was one of the most important parameters for the harvesting action. After the screening of targets to be picked, the target rectangle detection box was enlarged by 10% in length and width, in order to contain the peripheral information, such as pedicels and calyces. The expanded image block was then input into the Mobilenetv3 network model, in order to evaluate the relative position relationship between the target fruit and the fruit stalk. As such, the input was provided for the end-effector to change the picking position, and then choose the direction favorable for the fruit picking, in order to approach the fruit and then perform the action using the pose of the string. A harvesting robot system was also built consisting of a depth camera, a four-degree-of-freedom robot arm, a chassis, and a negative-pressure end-effector. The harvesting system was tested in the greenhouses at different times of the year, particularly for object detection, the prediction of the position relationship between fruit stalks, and fruit harvesting. The results showed that the average detection accuracy of cherry tomato bunches and fruits with different ripeness reached 89.9% with the Intersection over the union threshold of 0.5. The average inference time was 22 ms in the cascade detection system. Furthermore, the harvesting efficiency was improved by 28.7 percentage points, compared with targeting to be picked at a fixed angle. The average time was 10.4 s per fruit for harvesting fruits, indicating the better performance of the improved system. This finding can also provide a strong reference for fruit and vegetable harvesting robots.

HTML全文

参考文献(32)

施引文献

资源附件(0)