Abstract:
Selective picking of ripe cucumber fruits can often be realized under unstructured environments, where the cucumbers grow. However, manual harvesting is high cost and labor intensity. Cucumber-picking robots can be expected to reduce the manpower requirements in modern agriculture. Therefore, the vision system can dominate the accurate and rapid recognition of the fruit, leading to the high efficiency of the robot picking. This study aims to achieve the efficient selective picking of cucumber fruits under complex environments, such as light changes. The RT-Detr-EV model was also proposed to take the RT-Detr as the baseline network. The cucumber was selected as the research object. Firstly, the Re-parameterization VGG module was added into the backbone network. A multi-branch structure was then adopted during training, in order to strengthen the feature extraction of the network for the high recognition accuracy. While the multi-branch structure was merged during inference. The complexity of the network and the amount of computation were reduced to optimize the inference performance; Secondly, the lightweight cascade in the neck network was added into the grouping self-attention mechanism module, in order to reduce the computational overhead. Thus the high detection speed of the model was obtained to increase the depth of the network; Finally, Minimum Point Distance based Intersection Over Union (MPDIoU) also replaced the loss function in the original model. All the intersection and merger ratios were considered to calculate the regression loss of the target frame. The convergence of the model was accelerated to improve the detection accuracy. The results show that the mean average detection precision and speed of the improved RT-Detr-EV reached 95.8% and 61.3 frames/s, respectively, compared with the original model by 3.2 percentage points and 17.4 frames/s, respectively. The accuracy of identifying cucumbers that are not suitable for picking has increased by 4.6 and 6.5 percentage points, respectively, compared with the YOLOv7-X and YOLOv8-l. While the detection speed was improved by 40.6 and 25frames/s, respectively. The number of parameters was reduced by 55.5% and 27.3%, respectively. Meanwhile, the multi-scene application tests show that the RT-Detr-EV shared the high detection accuracy under different light angles. In the smooth light environment, the improved model was improved by 1.2 and 2.8 percentage points, compared with the YOLOv8 and RT-Detr model, respectively; There was improved by 4.5 and 5.5 percentage points under the backlight environment, respectively. Once the exposure level of the picking scene was in the interval of 40%-160%, the mAP
50 variations in the improved model were less than or equal to 0.2 and 0.5 percentage points in the smooth light and the backlight condition, respectively; The original magnitude was much smaller than that of the YOLOv8 and RT-Detr under various picking scenes with different light angles. Therefore, the improved model shared the better robustness and generalization udner complex picking scenarios with multiple changes in lighting conditions. In conclusion, the RT-Detr-EV network model was verified with the better performance index in the target detection task during fruit picking under the complex growth environment. The finding can also provide a valuable reference for the target localization task in the selective picking robots.