Abstract:
Safflower has attracted much attention in the field of intelligent harvesting due to their economic value and difficulty in harvesting. Safflower grown in natural environments often exists large scale variations and complex occlusion situations, which poses higher requirements for object detection algorithms. However, traditional object detection models often encounters missed or false detections when dealing with these issues, seriously affecting picking efficiency and accuracy. In this study, a YOLO-SSAR object detection algorithm based on multi-scale feature extraction was proposed by optimizing the original YOLOv5 model. The effectiveness and rationality of the improved algorithm were verified through ablation experiments, model comparison experiments, and detection effect analysis. Firstly, the ShuffleNet v2 lightweight structure was used to replace the backbone feature extraction network of the backbone layer to reduce the number of model parameters and calculations, and utilized efficient channel mixing and depthwise separable convolution to improve the efficiency of input feature extraction. Secondly, a Scale-Aware RFE module based on dilated convolution and shared weights was added to the Neck layer to improve the model's ability, which extracted multi-scale feature information. This module shared the weights of the main branch with other branches, lowering the number of model parameters while reducing the risk of overfitting by fusing residual connections, allowing objects of different scales to be uniformly transformed with the same representation ability. Finally, in order to solve the problem of intra-class and inter-class occlusion in object detection, the repulsion loss function was introduced into the head layer to replace the original loss function, so as to reduce the missed detection or false detection caused by improper selection of non-maximum suppression (NMS) threshold, and improved the detection rate of target overlap occlusion in dense scenes. The experimental results showed that the precision, recall
,and mean average precision of the YOLO-SSAR algorithm on the test set were 90.1%, 88.5%, and 93.4%, respectively. Compared with the original YOLOv5 model, the YOLO-SSAR algorithm were improved by 5.9, 9.2, and 7.7 percentage points, respectively, the inference speed reached 115 frames per second, and the model size was 9.7 MB, which emerged efficiency and lightweight advantages in practical applications. Compared with the mainstream algorithms YOLOv4, YOLOv7, YOLOV8s, Faster R-CNN and SSD, the detection accuracy of YOLO-SSAR algorithm was in a leading position, compared to the two-stage object detection algorithm of Faster R-CNN and the multi-scale object detection algorithm of SSD, which increased by 5.5 times and 3.6 times respectively. Meanwhile, the model size was only 4% of Faster R-CNN and 10% of SSD. The minimum model parameter quantity had good prospects in mobile devices with limited computing resources. The precision was 6.8, 7.2, 6.3, 16.2, and 10.8 percentage points higher, the recall was 9.4, 10.3, 9.5, 17.3, and 59.4 percentage points higher, and the mean average precision was 8.8, 8.2, 8.1, 14.9 and 19.4 percentage points higher than the comparison algorithm, respectively. Research suggested that the YOLO-SSAR algorithm could improve comprehensive detection performance while reducing computational complexity. The findings can provide algorithm references for the study of intelligent harvesting of safflower.