Abstract
Accurate detection of immature green citrus fruits in trees is one of the most critical steps for production decisions, such as early yield prediction, precise water and fertilizer management, and regulation of number of fruits hanging. However, the large model has been confined to identifying the immature citrus, due to the similar green citrus to the canopy background. Great challenges have been brought to rapidly and accurately detect the fruits, even to deploy the model. In this study, an improved model, YOLO-GC (you only look once-green citrus) was proposed to detect the green citrus fruits using YOLOv5s. The improved model was also deployed into the edge mobile devices, in order to achieve the real-time and convenient detection of green citrus fruits in trees. Firstly, the original backbone network was replaced with a lightweight GhostNet one, because the YOLOv5s network model was large and difficult to deploy. Meanwhile, the attention was then improved to the green citrus features. Accuracy degradation was reduced after the model was lightweight. Global Attention Mechanism (GAM) was embedded in the backbone network and the feature fusion layer, in order to extract the fruit features in complex environments. Secondly, a BiFPN (Bi-directional Feature Pyramid Network) architecture was introduced into the feature fusion layer for the multi-scale weighted feature fusion, in order to improve the dense and small-targeted fruits. Finally, the GIoU (Generalized Intersection over Union) loss function combined with Soft-NMS (Soft-Non-Maximum Suppression) was used to optimize the bounding box regression, in order to reduce the omission caused by the occlusion and overlapping of fruits and branches. The experimental results showed that the weight model memory of YOLO-GC was reduced by 53.9%, compared with the YOLOv5s. The number of parameters and the average inference time were reduced by 55.2% and 46.2%, respectively, whereas, the average precision (AP0.5) was improved by 1.2 percentage points. There was a lower amount of fruit leakage and misdetection in a variety of complex natural environments. The comprehensive performance of the YOLO-GC model was superior to that of CenterNet and seven commonly used network models, such as the YOLOv5s, YOLOv7, YOLOX, YOLOv8, CenterNet, Faster R-CNN, and RetinaNet target networks. The average accuracy of the YOLO-GC model was improved by 1.2, 1.3, 1.4, 0.9, 1.7, 4.6, and 3.9 percentage points, respectively, only 6.69 MB of weighted memory, thus achieving 97.6%, 90.3%, 97.8%, and 97.0% for the precision, recall, average precision, and F1 score, respectively. The YOLO-GC model was then deployed to the Android mobile App for testing. The detection accuracy reached 97.2%, which was 2.4 percentage points higher than that of the YOLOv5s. Furthermore, the inference duration (1 038 ms) was reduced by 85.8%. The IC model fully met the requirements of high-accuracy recognition and real-time inference of green citrus on the Android phone side. The finding can provide technical support to detect the green-like fruits in complex environments. The improved model was also deployed in the edge intelligent devices.