Abstract:
Abstract: Automatic fruit picking has widely been popular for the intelligent estimation of orchard economic harvest in the field of smart agriculture. Object detection using deep learning has presented broad application prospects for spring-see citrus detection and counting. But, it is still a very challenging task to detect and count spring-see fruits, due mainly to the small size of spring-see citrus, the high density of fruit on a single spring-see tree, the similar shape and color of fruits, and the tendency to be heavily shaded by foliage. In this study, a YOLOv4 network model was proposed using a recursive fusion of features (FR-YOLOv4) for spring-see citrus detection. CSPResNext50 network with a smaller receptive field was also selected to detect small targets with higher accuracy. As such, the feature extraction in the original YOLOv4 object detection network model was replaced with the CSPResNext50 network, particularly for the small size of spring-see citrus. Therefore, the difficulty was reduced to greatly improve the detection accuracy of small-scale spring-see citrus, where the feature map of small-scale objects was easily transmitted to the object detector. In addition, the Recursive Feature Pyramid (RFP) network was used to replace the original YOLOv4 Path Aggregation Network (PANet), because of the fuzzy and dense distribution of spring-seeing citrus. Correspondingly, RFP networks significantly enhanced the feature extraction and characterization capabilities of the entire YOLOv4 network at a small computational cost. More importantly, the detection accuracy of the YOLOv4 network was improved for spring-see citrus in a real orchard environment. Additionally, a dataset was collected using images and videos of spring-see citrus captured in various environments in spring-see orchards. Subsequently, three data augmentation operations were performed using OpenCV tools to add Gaussian noise, luminance variation and rotation. The dataset was obtained with a total of 3 600 images, of which the training dataset consisted of 2 520 images and the test dataset consisted of 1 080 images. The experimental results on this test dataset showed that the average detection accuracy of FR-YOLOv4 was 94.6% for spring-see citrus in complex orchard environments, and the frame rate of video detection was 51 frames/s. Specifically, the average detection accuracy increased by 8.9 percentage points, whereas, the frame rate of video detection was only 6 frames/s lower, compared with YOLOv4 before the improvement. Consequently, the FR-YOLOv4 presented an average detection accuracy of 29.3, 14.1, and 16.2 percentages higher than that of Single Shot Multi-Box Detector (SSD), CenterNet, and Faster Region-Convolutional Neural Networks (Faster-RCNN), respectively. The frame rate of video detection was 17 and 33 frames/s higher than SSD and Faster-RCNN, respectively. Anyway, the YOLOv4 network model using a recursive fusion of features (FR-YOLOv4) can widely be expected to detect and count spring citrus suitable for actual complex environments in orchards, indicating a higher detection accuracy with real-time performance.