空洞卷积结合全局池化的卷积神经网络识别作物幼苗与杂草

孙俊; 何小飞; 谭文军; 武小红; 沈继锋; 陆虎

doi:10.11975/j.issn.1002-6819.2018.11.020

空洞卷积结合全局池化的卷积神经网络识别作物幼苗与杂草

Recognition of crop seedling and weed recognition based on dilated convolution and global pooling in CNN

摘要

摘要: 针对传统Alex Net模型参数大、特征尺度单一的问题，该文提出一种空洞卷积与全局池化相结合的多尺度特征融合卷积神经网络识别模型。通过对初始卷积层的卷积核进行膨胀，以增大其感受野而不改变参数计算量，并采用全局池化代替传统的全连接层来减少模型的参数。通过设置不同膨胀系数的初始卷积层卷积核与全局池化层类型，以及设置不同Batch Size，得到8种改进模型，用于训练识别共12种农作物幼苗与杂草，并从建立的模型中选出最优模型。改进后的最优模型与传统Alex Net模型相比，仅经过4次训练迭代，就能达到90%以上的识别准确率，平均测试识别准确率达到98.80%，分类成功指数达到96.84%，模型内存需求减少为4.20 MB。实际田间预测野芥与雀麦幼苗的准确率都能达到75%左右，说明该文最优模型对正常情况下的幼苗识别性能较好，但对复杂黑暗背景下的甜菜幼苗准确率为60%，对恶劣背景下的识别性能还有待提升。由于模型使用了更宽的网络结构，增加了特征图的多尺度融合，保持对输入空间变换的不变性，故对正常情况下不同作物幼苗与杂草的识别能力较强。该文改进模型能达到较高的平均识别准确率及分类成功率，可为后续深入探索复杂田间背景下的杂草识别以及杂草与幼苗识别装置的研制打下基础。

Abstract: Abstract: The damage of weeds in the field to the seedlings of crops can seriously affect the photosynthesis of plant seedlings. To solve this problem, and to prevent weeds from affecting the growth of crop seedlings, it is of great significance to accurately identify crop seedlings and weeds. This paper proposed a model to identify weeds based on improved convolutional neural network (CNN). Aiming at the problems of many parameters of traditional Alex Net model and single feature scale, this paper adjusted the network structure of the traditional model by combining dilated convolution and global pooling, and extended the single convolution kernel into multi-scale convolution kernel then fused. It can optimize the training time and achieve high precision. We regarded each convolution layer input to calculate the bulk mean and variance, and then used the batch normalization while reducing some of the characteristics of the layer figure at the same time. We also used the global pooling to take the place of the last full connection layer. The model consists of 7 convolution layers, 1 fusion layer and 4 pooling layers. In image preprocessing phase, in order to prevent the deviation of the trained model caused by the unbalanced distribution of sample numbers, we had zoomed, flipped and rotated the original pictures of dataset randomly to get the augmented dataset, and used 80% of the pictures as the train dataset and the rest as the test dataset. These pictures were quantized to 256×256 dpi for CNN (convolutional neural network) training, and the original dataset and augmented dataset were used to train models. In order to find the optimal expansion coefficient, the expanded convolution kernels with expansion factor of 2 and 4 were used respectively at the first layer of convolution kernel. In addition, we compared the global average pooling with the global maximum pooling. The results showed that the global average pooling is better, which means that the average pooling can reduce the variance of the estimated value due to the limited size of the neighborhood; the background information of the image was retained more, it was more helpful for extracting the key features. However, the maximum pooling preserves more texture information and easily leads to the loss of the extracted deep feature information. According to various kinds of parameters, 8 sorts of models with different expansion coefficients and pooling types were designed. In order to further optimize the model and improve the average recognition accuracy, we also compared different batch sizes such as 64, 128, and 256, respectively. It can be seen that increasing batch size can increase memory utilization within a reasonable range and the efficiency of parallelization of matrix multiplication is improved. The number of iterations required for a single epoch (full dataset) is reduced, the processing speed for the same amount of data was accelerated, and then the optimal model was obtained. For the performance evaluation of the model, we use the average recognition accuracy and confusion matrix visualization to judge. Compared with the traditional Alex Net model, the optimal model can reach the recognition accuracy of more than 90% only after 4 training epochs, the memory requirement was far lower than the traditional one, and the average test recognition accuracy can reach 98.80%, which is due to that the improved model adopted the wider network structure and the method for preparation of global pooling. This may increase the multi-scale fusion of feature maps, enhance the relationship between figure and categories and keep the invariance of the input space transformation, so the recognition ability of different crop seedlings and weeds is stronger. It can be seen from the confusion matrix that the classification performance of the optimal model is high, which can be used as a reference for the development of intelligent identification device of weeds and seedlings.

HTML全文

参考文献(31)

施引文献

资源附件(0)