Abstract:
Safflower has often drawn much attention in the field of intelligent harvesting, due to their economic value. The safflower harvesting has also posed the higher requirements for object detection, due to the large-scale variations and complex occlusion in natural environments. Furthermore, missed or false detection has often occurred in traditional object detection, thus seriously affecting picking efficiency and accuracy. In this study, a YOLO-SSAR object detection was proposed to optimize the original YOLOv5 model using multi-scale feature extraction. The effectiveness and rationality of the improved algorithm were verified through ablation experiments, model comparison, and detection effect analysis. Firstly, the ShuffleNet v2 lightweight structure was used to replace the backbone feature extraction network of the backbone layer, in order to reduce the number of model parameters and calculations. The efficient channel mixing and depthwise separable convolution were utilized to improve the efficiency of input feature extraction. Secondly, a Scale-Aware RFE module was added to the Neck layer using dilated convolution and shared weights, in order to extract the multi-scale features. The weights of the main branch were shared with the rest branches, thus lowering the number of model parameters. The risk of overfitting was also reduced to fuse the residual connections, thereby allowing the objects of different scales to be uniformly transformed with the same representation. Finally, the repulsion loss function was introduced into the head layer to replace the original loss function, in order to avoid the intra-class and inter-class occlusion in the object detection. There was a reduction in the missed or false detection caused by improper selection of the non-maximum suppression (NMS) threshold. The detection rate of the target was improved with the overlap occlusion in dense scenes. The experimental results showed that the precision, recall, and mean average precision of the YOLO-SSAR algorithm on the test set were 90.1%, 88.5%, and 93.4%, respectively. Compared with the original YOLOv5 model, the YOLO-SSAR algorithm was improved by 5.9, 9.2, and 7.7 percentage points, respectively. The inference speed reached 115 frames per second, and the model size was 9.7 MB, indicating high efficiency and lightweight in practical applications. Compared with the mainstream algorithms YOLOv4, YOLOv7, YOLOV8s, Faster R-CNN, and SSD, the detection accuracy of the YOLO-SSAR algorithm was in a leading position. Compared with the two-stage and multi-scale object detection of Faster R-CNN and SSD, the improved model increased by 5.5 times and 3.6 times, respectively. Meanwhile, the model size was only 4% of Faster R-CNN and 10% of SSD. The minimum quantity of parameters shared the great prospects in the mobile devices with the limited computing resources. The precision was 6.8, 7.2, 6.3, 16.2, and 10.8 percentage points higher, the recall was 9.4, 10.3, 9.5, 17.3, and 59.4 percentage points higher, and the mean average precision was 8.8, 8.2, 8.1, 14.9 and 19.4 percentage points higher than the mainstream algorithm, respectively. The YOLO-SSAR algorithm improved the detection performance with less computational complexity. The findings can provide the algorithm references for the intelligent harvesting of safflower.