Abstract:
Manual picking cannot fully meet the large-scale apple harvesting at present, due to the high labor intensity and cost with low efficiency. Fruit picking robot has drawn much attention in recent years, in order to realize the automatic picking and yield estimation. Among them, the vision system can dominate the efficiency and stability of the picking robot. It is required for the high speed and accuracy of fruit recognition under complex natural environments. Therefore, the vision system can be expected to accurately recognize the fruits on the tree. This study aims to identify and locate the apples in natural environments, particularly on the interference factors, such as blocked fruits, ambient light, viewing angle distance. Apple was also taken as the test object. Traditional vision was improved to accurately segment the contour of target fruit. Instance segmentation and localization were proposed using the improved BlendMask model. The original backbone network was replaced with the high-resolution network, HRNet (High-Resolution Net), in order to alleviate the decreasing resolution of feature maps in deep networks. Convolutional block attention mechanism (CBAM) was also introduced in the fusion mask layer of the instance segmentation model. The instance mask was thus improved the instance segmentation. Ablation experiments were carried out to verify a variety of popular instance segmentation backbone networks. HRNet was selected as the backbone network. BlendMask model was used to achieve the better performance balance between real-time and segmentation accuracy. At the same time, fruit recognition and localization were implemented to consider the recognition accuracy in real time. Therefore, the improved model was suitable for the fruit target recognition and localization. Instance segmentation was designed to efficiently extract the surface point cloud of the instance. The instance mask was matched with the depth map, in order to obtain the 3D surface point cloud of apple target instance. The tangential and outlier noises were removed in the point cloud using uniform downsampling and statistical filtering. Then the center coordinates of the apples were estimated in the 3D space using the least-squares method (LSM). The center of target localization was achieved in the form of linearization of spherical equations. Other geometric indicators were also be used in the localization framework to realize the center localization of different kinds of fruits. The experimental results show that the average segmentation accuracy of the improved BlendMask model was 96.65%, and the detection speed reached 34.51 frames/s. The average segmentation accuracy of improved BlendMask model was 96.65%. The accuracy, recall and average accuracy were improved by 5.48%, 1.25% and 6.59%, respectively; Compared with the current new instance segmentation models, SparseInst, FastInst and PatchDCT, the average accuracy of the model was slightly lagged behind by 0.29%, 0.04% and 1.94%, respectively, whereas, the detection speed was ahead by 6.11%, 3% and 3%, respectively. The detection speed was 6.11, 3.84 and 20.08 frames/s ahead, respectively. The improved BlendMask model shared the high segmentation accuracy in real time. The finding can provide a technical solution to the vision system in apple-picking robots.