Abstract:
Xinmei is one of the European varieties with "paradise medicine fruit", because of its rich medicinal and health care and beauty functions. Today, Xinmei have been widely planted in Kashgar and Yili in Xinjiang Uygur autonomous region, China. Picking robots have also been used to harvest Xinmei in recent years. However, rapid and accurate visual detection is still challenging in the complex environment, such as overlap and occlusion. A large number of Xinmei fruits are blocked by leaves and trunks, due to the small target of Xinmei fruit and the dense branches and leaves. The key feature information of Xinmei cannot be detected for the expensive and high economic value. In addition, the more fruits overlapped, the larger the overlapping area among fruits was. In this study, a detection model was proposed for Xinmei in a complex environment using SFF-YOLOv5s. The dataset of Xinmei was constructed in the real orchard environment. The advanced YOLOv5s model was adopted as the network base. Firstly, the Coordinate Attention mechanism CA (Coordinate Attention) was introduced in the C3 module of the Backbone network, in order to extract the key feature information when the Xinmei was blocked by leaves. The number of parameters was reduced to embed on mobile devices. Secondly, the weighted bidirectional feature pyramid network was introduced into the Neck layer, in order to enhance the fusion among different feature layers of the model. The recognition of the model was promoted on the mutually occluding fruit. SIoU loss function was also used to replace the CIoU loss function in the original model, in order to accelerate the convergence speed for the high accuracy of the model. The test results showed that better performance was achieved, where the accuracy of the SFF-YOLOv5s model was 93.4%, the recall rate was 92.9%, the mean average precision (mAP) was 97.7%, the model weight was only 13.6MB, and the average detection time of a single image was 12.1ms. After the CA attention mechanism was added to the C3 module, the average accuracy of the improved model increased by 0.2 percentage points, while the number of parameters was reduced from 7.02 to 6.41M. With the weighted bidirectional feature pyramid network, the average accuracy of the model reached 97.6%, which was 0.5 percentage points higher than that of the original YOLOv5s. When SIoU was used as the loss function, the accuracy of the model was improved by 2 percentage points, compared with the original. Compared with the Faster R-CNN, YOLOv3, YOLOv4, YOLOv5s, YOLOv7, and YOLOv8s models, the average accuracy (mAP) was improved by 3.6, 6.8, 13.1, 0.6, 0.4 and 0.5 percentage points, respectively. The lowest computation and weight were achieved with the high detection speed. Therefore, the optimal performance of the SFF-YOLOv5s model can fully meet the requirements of real-time detection of Xinmei in the complex orchard environment. The finding can provide technical support to the visual perception of xinmei in picking robots.