融入全局相应归一化注意力机制的YOLOv5农作物害虫识别模型

    YOLOv5 model integrated with GRN attention mechanism for insect pest recognition

    • 摘要: 针对YOLOv5(you only look once version five)模型在农作物害虫密集目标上的检测效果无法满足实际需求,以及训练过程中模型收敛速度较慢等问题,该研究提出了融入全局响应归一化(global response normalization,GRN)注意力机制的YOLOv5农作物害虫识别模型(YOLOv5-GRNS)。设计了融入GRN注意力机制的编码器(convolution three,C3)模块,提高对密集目标的识别精度;利用形状交并比(shape intersection over union,SIoU)损失函数提高模型收敛速度和识别精度;在公开数据集IP102(insect pests 102)的基础上,筛选出危害陕西省主要农作物的8种害虫类型,构建了新数据集IP8-CW(insect pests eight for corn and wheat)。改进后的模型在新IP8-CW和完整的IP102两种数据集上进行了全面验证。对于IP8-CW,全类别平均准确率(mean average precision,mAP)mAP@.5和mAP@.5:.95分别达到了72.3%和47.0%。该研究还对YOLOv5-GRNS模型进行了类激活图分析,不仅从识别精度,而且从可解释性的角度,验证了对农作物害虫、尤其是密集目标的优秀识别效果。此外,模型还兼具参数量少、运算量低的优势,具有良好的嵌入式设备应用前景。

       

      Abstract: An automatic, rapid, and accurate detection is required to monitor the pest in the large-scale areas in the field. In this study, the YOLOv5 (you only look once version five) model was used to detect crop pests. The existing YOLOv5 was incorporated the the global response normalization attention mechanism (YOLOv5-GRNS). An accurate detection was realized for the targets on images with complex backgrounds and excessive pest density. The improved model also converged rapidly during training. Firstly, the Global Response Normalization (GRN) operation was introduced into the encoder module, named Convolution Three (C3) which incorporated the GRN attention mechanism. The C3 module was used to exchange the channel information for less background interference at the channel level, thereby improving the detection accuracy of dense targets. Secondly, the Shape Intersection over Union (SIoU) loss function was utilized to improve the convergence speed and detection accuracy of the improved model. Besides, 8 types of pests that harm major crops in Shaanxi Province were screened out, according to the public dataset IP102 (insect pests 102). Then the dataset was revised and expanded to obtain a new dataset, named IP8-CW (insect pests eight for corn and wheat). Extensive experiments of the YOLOv5-GRNS model were conducted on both the new IP8-CW dataset and the existing IP102 dataset. The mean average precision (mAP) was achieved at 72.3% with mAP@0.5 and 47.0% with mAP@0.5:0.95 in the IP8-CW dataset. The YOLOv5-GRNS model increased by 1.3% and 1.6%, respectively, compared with the standard YOLOv5. The best performance was also achieved in the larger IP102 dataset with a 96-class classification task, indicating the lower complexity and fewer parameters. Ablation experiments were then conducted on the IP8-CW dataset to explore the influence of different factors on YOLOv5-GRNS performance. The results showed that a more regular path was achieved in the prediction box fitting the ground truth box using the improved model with the SIoU loss function. Thus the convergence rate of 30 epochs was promoted, compared with the rest two loss functions. The performance of the improved model was significantly improved in the sandwich structure using the GRN operation as the normalization and the channel attention layer. Furthermore, the performance was also higher than that of the structure, where the GRN was only one of them. The ablation experiment showed that there was only a little improvement in the YOLOv5-standard models with the rest attention mechanisms. The improved model with GRN operation was achieved in the best detection performance with the lowest model complexity and minimum parameters. Class Activation Maps (CAM) used the heat maps to mark the key locations that the model focused on in red. This feature of CAM was used to verify the effectiveness of the improved attention mechanism on the YOLOv5-GRNS model. Three datasets showed that the YOLOv5-GRNS model was concentratedly and accurately focused on the target area, rather than the complex background or dense targets. In summary, the YOLOv5-GRNS can be expected to serve as a robust response in the field of pest detection. Moreover, the excellent performance of the YOLOv5-GRNS model was also verified in the detection of small and dense targets on different datasets, indicating better generalization and interpretability with fewer parameters and less computational complexity. The improved model can also be applied to the embedded devices and mobile devices.

       

    /

    返回文章
    返回