基于改进空洞卷积神经网络的丘陵山区田间道路场景识别

李云伍; 徐俊杰; 刘得雄; 于尧

doi:10.11975/j.issn.1002-6819.2019.07.019

摘要: 基于机器视觉的自主导航是智能农业机械的主要导航方式之一。丘陵山区复杂的田间道路场景，使得智能农机在田间道路上的自主导航与避障存在较大的困难。该文根据丘陵山区田间道路图像特征，将田间道路场景对象分为背景、道路、行人、植被、天空、建筑、牲畜、障碍、水塘、土壤和杆等11类，构建了基于空洞卷积神经网络的田间道路场景图像语义分割模型。该模型包括前端模块和上下文模块，前端模块为VGG-16融合空洞卷积的改进结构，上下文模块为不同膨胀系数空洞卷积层的级联，采用两阶段训练方法进行训练。利用CAFFE深度学习框架将改进的网络模型与经典的FCN-8s网络模型进行了对比测试，并进行了道路阴影的适应性测试。语义分割测试结果表明：Front-end + Large网络的统计像素准确率、类别平均准确率以及平均区域重合度都最高，而FCN-8s网络最低；Front-end + Large网络在无阴影道路训练集和有阴影道路训练集上的平均区域重合度分别为73.4%和73.2%，对阴影干扰有良好的适应性。该文实现了丘陵山区田间道路场景像素级的预测，能为智能农业机械在田间道路上基于机器视觉的自主导航和避障奠定基础。

Abstract: Accurate acquisition of drivable area and obstacle information on field road is an important research for automatic navigation of intelligent agricultural machinery based on machine vision. In order to accurately identify field roads and its surrounding environment, an image semantic segmentation model of field road scene was proposed based on DCN (dilated convolutional neural network). Field roads in hilly regions are often twisted, windy and rolling, and occluded by different types of crops along both sides and many kinds of obstacles on the roads. Based on the analysis of image features of field roads in hilly regions, the field road scenes were divided into 11 categories in this paper: Background, road, pedestrian, vegetation, sky, construction, livestocks, obstacle, pond, soil and pole. Based on a traditional FCN (fully convolutional neural network) of VGG-16 structure, the front-end module and context aggregation module in DCN were put forward by removing the part that wasn’t conducive to pixel prediction and restructuring a higher prediction-accuracy front-end module. The front-end module was improved based on the VGG-16. The pooling 4 and pooling 5 layers in VGG-16 were removed, and the three convolutions in Conv-5 were replaced by dilated convolution with expansion coefficient of 2, and the convolution Fc6 layer was changed to dilated convolution with an expansion coefficient of 4 to keep the receptive field unchanged. At the same time, the padding operation in the VGG-16 was deleted. The context module was a cascade of void convolution layers with different expansion coefficients and the first six layers were dilated convolutions with expansion coefficients of 1, 1, 2, 4, 8 and 16, respectively. Also two context module structure, namely Basic and Large, were proposed. The parameters of the constructed DCN could be initialized using the traditional VGG-16 network and produced higher resolution output. Then the two-stage training method was adopted to solve the problems of long training time and difficult convergence. In CAFFE (convolutional architecture for fast feature embedding) deep learning framework, the improved network models were constructed and compared with the classical FCN-8s network model. The FCN-8s network model, the network model constructed only with front-end module, and that with both the front-end module and context module (Basic and Large structure were used respectively) network model were tested. The adaptability of the network model constructed with both Front-end and context module with Large network to shadowed road images was verified better, namely the evaluation index of PA (statistical pixel accuracy), MPA (category average accuracy) and MIoU (mean intersection over union) of which were the highest. FCN-8s network model were the lowest in the evaluation index. Then the network model constructed with both Front-end and context module with Large network was used as the semantic segmentation model for field road scene. Its MIoU was 73.4% in unshadowed road test dataset and it was only decreased by 0.2 percentage points in shadowed road test dataset. Moreover, the PA and MPA in unshadowed road test dataset and shadowed road test dataset were almost the same, respectively. The results showed that the improved model in this paper had good adaptability to the shadow disturbance of field road scenes in hilly regions. The proposed model has good generalization and robustness, which realizes the prediction of pixel level of field road image in hilly regions, and provides basic support for the autonomous navigation and obstacle avoidance of agricultural machines on field roads.

基于改进空洞卷积神经网络的丘陵山区田间道路场景识别

Field road scene recognition in hilly regions based on improved dilated convolutional networks