Abstract:
Mechanized harvesting of fruit has been the ever-increasing trend in smart agriculture. However, the picking quality has been confined to the commonly-used unstructured planting for fruit trees. The robots can easily encounter the collisions with the dense branches and leaves during harvesting. The random spatial distribution of orchard obstacles can be highly influenced by the variation in the lighting conditions. It is very challenging on the real-time reconstruction of obstacles. In this study, a real-time reconstruction of orchard obstacles was proposed to combine the semantic segmentation and 3D point cloud analysis. Firstly, taking guava orchard as an example, the image acquisition was carried out using Intel RealSense D435i depth camera at a distance of about 15-50 cm from the guava fruit trees with a resolution of 640×480 in the orchard of Haiou Island Jiashuo Farm, Guangzhou City, Guangdong Province, China. Data enhancement was performed to improve the generalization and robustness of the model, according to the mirroring, luminance enhancement, luminance attenuation, adding Gaussian noise and pretzel noise. A total of 1 250 samples were obtained to randomly divided into the training, the validation and the test set in a ratio of 6:3:1, where there were 750 samples in the training set and 375 samples in the validation set; The test set was 125 samples. Then, a real-time semantic segmentation model was built using DeepLabV3+. The CoT module was introduced on the low-level feature map and high-level feature map of the encoder, with the MobileNetV3-large as the backbone network. The segmentation accuracy was improved on the thin obstacles like twig groups. The detail information of feature maps was strengthened to suppress the useless information. Furthermore, an edge-assisted loss function was proposed using the Sobel operator. Laplace edge loss function was used to further increase the accuracy of semantic segmentation, in order to improve the accuracy of picture edge segmentation. Next, the intrinsic parameters of the depth camera were combined to transform the semantic map of obstacles into a 3D point cloud. Statistical analysis was applied to remove the outliers. The voxelization was used to approximate the reconstruction of tree trunks and branches. Moreover, Euclidean clustering was employed to identify the individual 3D fruits. Among them, the minimum bounding box of the fruit was served as the reconstruction. Finally, the pose change matrix between the camera and the robot was obtained using the hand-eye calibration with a chessboard pattern. The minimum bounding box of the fruit and the voxelized cubes of branches and tree trunks were transformed into the working space of robot picking. The experimental results show that the MIoU (mean intersection over union) of obstacles in MobileNetV3-large-DeepLabV3+ was improved from 69.7% to 74.3% after the introduction of the CoT block, indicating an increase of 4.6 percentage points; The Sobel algorithm also considered both horizontal and vertical gradient information, indicating some anti-noise performance. The MIoU of the obstacle was further improved from 74.3% to 77.2%, indicating an improvement of 2.9 percentage points. The overall MIoU of the improved model was 7.0 percentage points, 0.6 percentage points, and 1.0 percentage points higher than those of Xception-DeepLabV3+, BranchNet, and LRASPP, respectively. There was 1.7 percentage points lower than BiSeNetV3; The Acc was 3.1 percentage points, 0.5 percentage points, and 0.6 percentage points higher than those of Xception-DeepLabV3+, BranchNet, and LRASPP, respectively, while there was 2.2 percentage points lower than BiSeNetV3; The FPS of the improved model was as high as 78.8 frames per second, which was faster than the comparison; The number of parameters was 4.4 M, which is 8% and 31% of Xception-DeepLabV3+ and BiSeNetV3, respectively, while BranchNet and LRASPP are 68% and 73% of the parameters in this paper, respectively. Besides, the IoU (intersection over union) values of the voxelized cubes for the fruit and branches were 65.7% and 56.6%, respectively, and the voxelization time was 0.34 s and 0.84 s, respectively. The better robustness and real-time performance can fully meet the obstacle avoidance requirements of guava picking robots. The real-time semantic segmentation of irregular obstacles was optimized to utilize more sophisticated attention modules or backbone networks. The finding can provide the better ways to enhance the effects of obstacle reconstruction.