Sun Jun, Tan Wenjun, Wu Xiaohong, Shen Jifeng, Lu Bing, Dai Chunxia. Real-time recognition of sugar beet and weeds in complex backgrounds using multi-channel depth-wise separable convolution model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(12): 184-190. DOI: 10.11975/j.issn.1002-6819.2019.12.022
    Citation: Sun Jun, Tan Wenjun, Wu Xiaohong, Shen Jifeng, Lu Bing, Dai Chunxia. Real-time recognition of sugar beet and weeds in complex backgrounds using multi-channel depth-wise separable convolution model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(12): 184-190. DOI: 10.11975/j.issn.1002-6819.2019.12.022

    Real-time recognition of sugar beet and weeds in complex backgrounds using multi-channel depth-wise separable convolution model

    • Abstract: Mechanical weeding can reduce the use of pesticides and is of great significance to ensure high yield of crops. Real-time and accurate identification of crops is a key technical problem needs to be solved in mechanical weeding equipment. Because of the subjectivity of feature extraction process in weed recognition, the accuracy of traditional methods in actual field environment is low. In recent years, the method of weed identification based on convolution neural network has been widely studied. Although the accuracy is obviously improved, there are still problems such as large parameters and poor real-time performance. In order to solve the above problems, a four-channel input image is constructed by collecting near infrared and visible images of sugar beet in the field, and a lightweight convolution neural network based on codec structure is proposed. In this paper, Sugarbeet and weed images collected from a farm in Bonn, Germany, in 2016 were used as data sets, which covered images of different growth stages of sugar beet, and 226 pictures of which were randomly selected as training sets, and the remaining 57 pictures were used as test sets. The experimental data set was composed of three channels of visible light image and one channel of near infrared image, which are merged into a four-channel image by pixel level superposition, and the depth-wise separable convolution was used in the deep model. Firstly, the input feature image was convoluted in 2 dimensions convolution kernel and the number of channels was expanded. Then, the 1×1 convolution kernel was used to make the 3 dimensions convolution which combined channel feature and compressed the channels to enhance the nonlinear mapping ability of the model. In order to avoid the problem of the gradient disappearing, the residual block was used to connect the input and output of the depth-wise separate convolution. Finally, the coding and decoder structure was designed and the shallow features were combined with deep features to refine the segmentation effect. Due to the imbalance of pixel proportions of soil, crops and weeds, the weighted loss function was used to optimize the model. The segmentation accuracy, parameters and operating efficiency of models at different input resolutions and different width factor were introduced to evalute the model. When the width factor was 1, the segmentation accuracy of the model increased with the increase of the input image resolution, the model accuracy of four channel input was higher than that of the model based on original visible image input, which showed that the near-infrared image features can compensate the defects of ordinary RGB images to some extent, and make the model more suitable for the dark environment. Under the same input image resolution, the model with a width factor of 2 or 4 performs better than the model with a width factor of 1. With the increases of width factor, the parameters of the model increase greatly. The amount of calculation is related to the size of the input image, so the frame rate gradually decreases with the increase of the size of input image. The experimental results show that the optimal model in this paper is a four channel input model with a width coefficient of 2, and the average intersection union ration is 87.58%, the average pixel accuracy is 99.19%, the parameters are 525 763 and the frame rate is 42.064 frames/s. The model has high segmentation and recognition accuracy and good real-time performance, and can provide theoretical basis for the development of intelligent mechanization weeding equipment.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return