Abstract:
Manual picking cannot fully meet the large-scale production in China at present. Robotic picking has been an inevitable trend, particularly with the shortage of labor resources and the rapid development of mechanical automation. It is very necessary to accurately identify and position the apples in the complex environments. Fruit attitude fusion acquisition can be synchronously realized and then classified the apple information. Sometimes, only a small portion of target fruit is covered from the orchard environment, including the leaves, branches, and fruits. There are the small differences among the fruit growth patterns. The convolutional neural network is easy to cause the deep feature map, and then lose the key information of fruit covering parts after multiple convolution operations, resulting in the misrecognition of the fruit growth pattern. At the same time, the detection network can easily identify two apples as one for the overlapping fruits in the natural environment, thus causing the omission of the occluded fruits. In this study, an improved YOLOv7 model was proposed to recognize the apple posture from the growth morphologies. Firstly, the multi-scale feature fusion network was improved to add a 160×160 feature scale layer in the backbone network. The sensitivity of the model was enhanced to identify the tiny local features; Secondly, CBAM attention mechanism was introduced to improve the target region of interest; Finally, the Soft-NMS was used to effectively avoid the high-density overlapping targets being suppressed at one time, thus reducing the missed detection. The experimental results show that the recognition accuracy, recall and average recognition precision of DCS-YOLOv7 were 86.9%, 80.5% and 87.1%, respectively, which were 4.2%, 2.2% and 3.7% higher than the original YOLOv7 model. The average accuracy and speed were greatly improved to fully meet the requirements of picking robot. In addition, an apple gesture recognition was proposed using semantic segmentation and the minimum outer join features. Firstly, comparison tests showed that the Unet model exhibited the best performance in apple image segmentation. The average pixel accuracies were 0.7 and 0.2 percentage points higher than those of DeepLabv3+ and PSPNet. The average intersection and merger ratios were 1.6 and 1.1 percentage points higher as well. The average speed of segmentation also outperformed the rest. As such, the UNet instance segmentation network was chosen as the apple segmentation model. The apple image was segmented using UNet semantic segmentation network. The apple and calyx contour features were obtained by the contour extraction , and then the pose of unobstructed apple was obtained using the apple minimum external feature. The accuracy was 94% to detect the apple pose. The average processing time for each image was 15.7ms, indicating the better acquisition for the pose of apple target. The validity and correctness of recognition model were verified with the high detection accuracy to integrate the recognition of fruit growth pattern and posture. The recognition of fruit posture was considered to classify the growth pattern of apples. The end-effector can rapidly and accurately pick the fruits in a suitable way. The finding can lay the foundation for the non-destructive and efficient picking of apples.