Abstract:
Cotton is one of the largest producing and consuming crops in China. Accurate detection of cotton pests is an important premise for improving the cotton quality. In this study, an ECSF-YOLOv7 pest detection model was proposed to address the high insect similarity and serious background interference in the natural environments of cotton fields. Firstly, EfficientFormerV2 was used as the feature extraction network, in order to strengthen the feature extraction of the network with the smaller number of parameters of the model. At the same time, the convolutional block attention module (CBAM) was embedded in the backbone output of the model, in order to enhance the extraction of small targets and weaken background interference; Secondly, GSConv was used to build a Slim-Neck network structure, which was reduced the number of model parameters while maintaining the recognition accuracy. Finally, Focal EIOU loss was used as the bounding box regression loss function to accelerate the network convergence for high detection accuracy. The dataset was selected as 17 types of insect images in cotton fields. The python scripts were used to enhance the annotated images, including random brightness, random flipping, mirror transformation, and Gaussian noise. The robustness of the model was also improved to build more insect recognition scenes in natural environments. Finally, a total of 6 273 images of the cotton field insect dataset were obtained with sufficient sample quantity and relatively balanced distribution. Four experiments were conducted to verify the excellent performance of the improved model, including ablation experiments, gradient-weighted class activation mapping (Grad-CAM) of attention mechanism, loss function, and mainstream model performance. Ablation experiments showed that the improved modules had a positive effect. The feature extraction of the image also varied, when CBAM was embedded in the different positions of the model. The Grad-CAM was used to generate a heat map of object detection. The region of interest of the heat map was closer to the real pest area, and less affected by background interference when the CBAM was embedded in the backbone output of the model. Five bounding box loss functions were compared: DIOU, EIOU, MPDIOU, CIOU, and Focal EIOU. Since the Focal loss function was combined to automatically adjust the loss weights of different types of samples, the Focal-EIOU bounding box loss function achieved the best overall performance and the highest detection accuracy. The results showed that the mean average precision (mAP) of the ECSF-YOLOv7 model was 95.71%, which was 1.43, 9.08, 1.94, and 1.52 percentage points higher than the mainstream object models YOLOv7, SSD, YOLOv5l, and YOLOX, respectively. The improved model was only 20.82 M in the number of model parameters, which was reduced by 44.15, 12.26, 55.25, and 17.9 percentage points, respectively. The ECSF-YOLOv7 model had an average detection speed of 69.47 frames per second, which was 5.26 frames higher than the YOLOv7 model. The high detection accuracy was also obtained in the situations of insect overlap, high similarity between species, small targets, and background interference. In summary, the ECSF-YOLOv7 model can be expected with high detection accuracy, fast detection speed, and smaller parameter quantity. The finding can provide technical support for the rapid and accurate detection of cotton field pests.