Field road scene recognition in hilly regions based on improved dilated convolutional networks
-
-
Abstract
Accurate acquisition of drivable area and obstacle information on field road is an important research for automatic navigation of intelligent agricultural machinery based on machine vision. In order to accurately identify field roads and its surrounding environment, an image semantic segmentation model of field road scene was proposed based on DCN (dilated convolutional neural network). Field roads in hilly regions are often twisted, windy and rolling, and occluded by different types of crops along both sides and many kinds of obstacles on the roads. Based on the analysis of image features of field roads in hilly regions, the field road scenes were divided into 11 categories in this paper: Background, road, pedestrian, vegetation, sky, construction, livestocks, obstacle, pond, soil and pole. Based on a traditional FCN (fully convolutional neural network) of VGG-16 structure, the front-end module and context aggregation module in DCN were put forward by removing the part that wasn’t conducive to pixel prediction and restructuring a higher prediction-accuracy front-end module. The front-end module was improved based on the VGG-16. The pooling 4 and pooling 5 layers in VGG-16 were removed, and the three convolutions in Conv-5 were replaced by dilated convolution with expansion coefficient of 2, and the convolution Fc6 layer was changed to dilated convolution with an expansion coefficient of 4 to keep the receptive field unchanged. At the same time, the padding operation in the VGG-16 was deleted. The context module was a cascade of void convolution layers with different expansion coefficients and the first six layers were dilated convolutions with expansion coefficients of 1, 1, 2, 4, 8 and 16, respectively. Also two context module structure, namely Basic and Large, were proposed. The parameters of the constructed DCN could be initialized using the traditional VGG-16 network and produced higher resolution output. Then the two-stage training method was adopted to solve the problems of long training time and difficult convergence. In CAFFE (convolutional architecture for fast feature embedding) deep learning framework, the improved network models were constructed and compared with the classical FCN-8s network model. The FCN-8s network model, the network model constructed only with front-end module, and that with both the front-end module and context module (Basic and Large structure were used respectively) network model were tested. The adaptability of the network model constructed with both Front-end and context module with Large network to shadowed road images was verified better, namely the evaluation index of PA (statistical pixel accuracy), MPA (category average accuracy) and MIoU (mean intersection over union) of which were the highest. FCN-8s network model were the lowest in the evaluation index. Then the network model constructed with both Front-end and context module with Large network was used as the semantic segmentation model for field road scene. Its MIoU was 73.4% in unshadowed road test dataset and it was only decreased by 0.2 percentage points in shadowed road test dataset. Moreover, the PA and MPA in unshadowed road test dataset and shadowed road test dataset were almost the same, respectively. The results showed that the improved model in this paper had good adaptability to the shadow disturbance of field road scenes in hilly regions. The proposed model has good generalization and robustness, which realizes the prediction of pixel level of field road image in hilly regions, and provides basic support for the autonomous navigation and obstacle avoidance of agricultural machines on field roads.
-
-