基于可形变VGG-16模型的田间作物害虫检测方法

张善文; 许新华; 齐国红; 邵彧

doi:10.11975/j.issn.1002-6819.2021.18.022

摘要: 由于田间害虫种类多，大小、形态、姿态、颜色和位置变化多样，且田间害虫的周围环境比较复杂，使传统田间害虫检测方法的性能不高，而现有基于卷积神经网络的作物害虫检测方法采用固定的几何结构模块，不能有效应用于田间多变的害虫检测。该研究在VGG-16模型的基础上构建了一种可形变VGG-16模型（Deformable VGG-16，DVGG-16），并应用于田间作物害虫检测。在DVGG-16模型中，引入可形变卷积后能够适应不同形状、状态和尺寸等几何形变的害虫图像，提高了对形变图像的特征表达能力，然后利用1个全局平均池化层替代VGG-16模型中的3个全连接层，以加快模型的训练。通过DVGG-16模型与VGG-16模型对比试验发现，DVGG-16模型提升了对田间害虫图像的形状、大小等几何形变的适应能力，在不改变图像空间分辨率的情况下，实现了对不规则田间害虫图像的特征提取，在实际田间害虫图像数据库上的检测准确率为91.14%。试验结果表明，DVGG-16模型提升了VGG-16模型对害虫多样性图像的特征表达能力，具有一定的图像形变适应能力，能够较准确地检测到田间形状变化多样的害虫，可为田间复杂环境下作物害虫检测系统提供技术支持。

Abstract: Abstract: Detection of crop pest has widely been one of the most challenges in modern agriculture, due to the intra- and inter-class pests in the field with various colors, sizes, shapes, postures, positions, and complex backgrounds. Convolutional Neural Network (CNN) has presented an excellent performance on the detection and recognition of complex images. However, the current CNN models cannot adapt to the geometric deformation of pests. In this study, a deformable VGG-16 (DVGG-16) model was constructed and then applied for the detection of crop pest in the field. The framework consisted of six convolutional layers, four deformable convolutional layers, five pooling layers, and one global average pooling layer. Furthermore, the network training was utilized to speed up the global average pooling operation, instead of three fully connected layers of VGG-16. Four convolutional layers in VGG-16 were replaced by four deformable convolutional layers, in order to improve the characteristic expression ability of network and the practicality of VGG-16 to insect image deformation. Moreover, a global pooling layer was used instead of three fully connected layers of VGG-16, in order to reduce the number of the training parameters, while accelerate the network training speed free of the over-fitting. The offset was added in the deformable convolution unit, thereby to serve one part of DVGG-16 structure. Among them, another parallel standard convolution unit was used to calculate and then learn end-to-end through gradient backpropagation. Subsequently, the size of deformable convolution kernels and position were adjusted, according to the current need to identify the dynamic image content of crop pests, particularly suitable for different shapes, sizes, and other geometric deformation of the object. Moreover, data augmentation was performed on the original dataset to increase the number of training samples. A series operations were also included for the better generalization ability and robustness of model, such as bilinear interpolation, cropping and rotating images, and adding salt-pepper noise to the images. A parallel convolution layer was used in DVGG-16 to learn the offset corresponding to the input feature map. The constraint was easily broken for the regular grid of normal convolution, where an offset was added at the corresponding position of each sampling point, while the arbitrary sampling was performed around the sampling location. More importantly, the deformable convolution was greatly contributed to the DVGG-16 model for better suitable for various insect images with different shapes, states, and sizes. An image database of actual field pest was evaluated to compare with two feature extraction and two deep learning, including image-based Orchard Insect Automated Identification (IIAI), Local Mean Color Feature and Support Vector Machine (LMCFSVM), Improved Convolutional Neural Network (ICNN), and VGG-16. Specifically, the detection accuracy of DVGG-16 was 91.14%, which was 28.60 and 26.97 percentage higher than that of IIAI and LMCFSVM, and 7.72 and 9.01 percentage higher than that of ICNN and VGG-16 based models, respectively. The training time of DVGG-16 was 7.98 h longer than that of the ICNN, because the deformable convolution operation was realized by bilinear interpolation, which resulted in the increase of computational complexity and training time of DVGG-16 compared with ICNN. The test time of DVGG-16 based model was 0.02 and 0.17 s faster than that ICNN and VGG-16 based models, respectively. Consequently, the DVGG-16 was effective and feasible to detect the variable pests in the field. The finding can provide a strong reference for the effective detection of pests in the complex field background, further to realize the feature extraction of irregular field insect images without changing the spatial resolution.

基于可形变VGG-16模型的田间作物害虫检测方法

Detecting the pest disease of field crops using deformable VGG-16 model