Abstract:
Abstract: An improved YOLOv4 model was proposed to improve the speed and accuracy of orange fruit picking. A lighter structure and a smaller weight size were also designed easy to be migrated into the mobile terminal. Two-dimensional coordinates of the fruit centroid were also obtained from the color image shot by the RealSense depth camera. The depth value of the centroid point in the corresponding depth map was extracted after the registration of the depth and color map, in order to realize the three-dimensional spatial positioning of the fruit. The MobileNet v2 was taken as the backbone network in the structure of the improved YOLOv4 model. The depthwise separable convolutions were used to replace the ordinary convolutions in the neck structure, in order to further reduce the weight of the model for the higher detection speed. A comparison experiment was carried out on the detection effects between the improved YOLOv4, the original YOLOv4, YOLOv4-tiny, Ghostnet-YOLOv4, and classical convolutional neural networks like Faster RCNN and SSD, in order to verify the effectiveness and superiority of the improved model. The results showed that the value of precision, recall, F1, and average precision of the improved model were 97.57%, 92.27%, 94.85%, and 97.24%, respectively, which were close to the original YOLOv4 model at a high level. Meanwhile, the average detection time and the model size were reduced by 11.39 ms, and 197.5 M, respectively. The average precisions of the detection were improved by 2.69 percentage point and 3.11 percentage point, respectively, compared with the Faster RCNN and SSD models. Correspondingly, the model size decreased by 474.5 and 44.1 M, respectively. The recall of the proposed detection model increased by 4.22 percentage point, compared with the YOLOv4-tiny model. Additionally, the average detection time of the improved model decreased by 7.15 ms to that of Ghostnet-YOLOv4. Besides, an investigation was made to explore the influences of occlusion degree on the detection accuracy of the improved model. The average precision for the detection of the improved model in the severe occlusion test set was 95.37%, and 3.98 percentage point lower than those in the slight ones. It seemed that the severe occlusion reduced the detection accuracy. The improved model still retained high detection performance, even under the interference of occlusion. The three-dimensional spatial positioning of orange fruit was used to verify the proposed location, which combined the improved YOLOv4 model and RealSense camera. The improved YOLOv4 model was applied to recognize and locate a total of 78 fruits in actual orchard environment in order to verify the effectiveness of the proposed location algorithm. The results showed that there was a 98.72% success rate in two-dimensional fruit recognition. The mean absolute errors in the horizontal and vertical directions were 0.91 and 1.26 pixels, respectively. The mean absolute percentage errors were both within 1 percentage point. Moreover, the success rate of three-dimensional fruit positioning reached 96.15%. The mean absolute error of depth information was 3.48 cm, and the mean absolute percentage error was 2.72%. Consequently, the prediction errors in the three directions all remained in a small range, which fully met the need of accurate positioning for the picking manipulators. In conclusion, the finding can also provide a target location approach with strong robustness, excellent real-time performance, and high accuracy for the picking operation in complex scenes.