Abstract:
Abstract: Detection of crop pest has widely been one of the most challenges in modern agriculture, due to the intra- and inter-class pests in the field with various colors, sizes, shapes, postures, positions, and complex backgrounds. Convolutional Neural Network (CNN) has presented an excellent performance on the detection and recognition of complex images. However, the current CNN models cannot adapt to the geometric deformation of pests. In this study, a deformable VGG-16 (DVGG-16) model was constructed and then applied for the detection of crop pest in the field. The framework consisted of six convolutional layers, four deformable convolutional layers, five pooling layers, and one global average pooling layer. Furthermore, the network training was utilized to speed up the global average pooling operation, instead of three fully connected layers of VGG-16. Four convolutional layers in VGG-16 were replaced by four deformable convolutional layers, in order to improve the characteristic expression ability of network and the practicality of VGG-16 to insect image deformation. Moreover, a global pooling layer was used instead of three fully connected layers of VGG-16, in order to reduce the number of the training parameters, while accelerate the network training speed free of the over-fitting. The offset was added in the deformable convolution unit, thereby to serve one part of DVGG-16 structure. Among them, another parallel standard convolution unit was used to calculate and then learn end-to-end through gradient backpropagation. Subsequently, the size of deformable convolution kernels and position were adjusted, according to the current need to identify the dynamic image content of crop pests, particularly suitable for different shapes, sizes, and other geometric deformation of the object. Moreover, data augmentation was performed on the original dataset to increase the number of training samples. A series operations were also included for the better generalization ability and robustness of model, such as bilinear interpolation, cropping and rotating images, and adding salt-pepper noise to the images. A parallel convolution layer was used in DVGG-16 to learn the offset corresponding to the input feature map. The constraint was easily broken for the regular grid of normal convolution, where an offset was added at the corresponding position of each sampling point, while the arbitrary sampling was performed around the sampling location. More importantly, the deformable convolution was greatly contributed to the DVGG-16 model for better suitable for various insect images with different shapes, states, and sizes. An image database of actual field pest was evaluated to compare with two feature extraction and two deep learning, including image-based Orchard Insect Automated Identification (IIAI), Local Mean Color Feature and Support Vector Machine (LMCFSVM), Improved Convolutional Neural Network (ICNN), and VGG-16. Specifically, the detection accuracy of DVGG-16 was 91.14%, which was 28.60 and 26.97 percentage higher than that of IIAI and LMCFSVM, and 7.72 and 9.01 percentage higher than that of ICNN and VGG-16 based models, respectively. The training time of DVGG-16 was 7.98 h longer than that of the ICNN, because the deformable convolution operation was realized by bilinear interpolation, which resulted in the increase of computational complexity and training time of DVGG-16 compared with ICNN. The test time of DVGG-16 based model was 0.02 and 0.17 s faster than that ICNN and VGG-16 based models, respectively. Consequently, the DVGG-16 was effective and feasible to detect the variable pests in the field. The finding can provide a strong reference for the effective detection of pests in the complex field background, further to realize the feature extraction of irregular field insect images without changing the spatial resolution.