Zhu Lixue, Wu Rongda, Fu Genping, Zhang Shiang, Yang Chenyu, Chen Tianci, Huang Peichen. Segmenting banana images using the lightweight UNet of multi-scale serial dilated convolution[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(13): 194-201. DOI: 10.11975/j.issn.1002-6819.2022.13.022
    Citation: Zhu Lixue, Wu Rongda, Fu Genping, Zhang Shiang, Yang Chenyu, Chen Tianci, Huang Peichen. Segmenting banana images using the lightweight UNet of multi-scale serial dilated convolution[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(13): 194-201. DOI: 10.11975/j.issn.1002-6819.2022.13.022

    Segmenting banana images using the lightweight UNet of multi-scale serial dilated convolution

    • An efficient and accurate fruit detection has been widely used as one of the most important tasks in the field operation of agricultural robots. However, a great challenge is still remained on the complex background and the uncertainty of orchard environment, due to the similarity or occlusion between fruit and branches. The traditional UNet network cannot fully meet the banana recognition and picking system, such as the low real-time performance, the large number of parameters, and spatial information loss after down sampling. In this study, a lightweight UNet was proposed for the banana image segmentation using multi-scale serial dilated convolution. The coding layer of UNet network was firstly used to visualize the feature map. The UNet network was utilized to extract the similar features for many times, in order to better identify the edge texture of the targets, and color features. Therefore, the lightweight backbone of feature module was constructed as the Concat Depth Wise Block (CDWBlock). The effective feature map was obtained in the module using cheaper operation. As such, the number of model parameters and computation were reduced significantly, without losing feature extraction ability of the network. There were also some specific characteristics in a banana orchard, including the complex background, the large cluster of banana fruit, the small stalk of banana fruit, and the color low contrast. In the neural network, the filter with the wide receptive field was easier to identify the large object, while the narrow was easier to identify the small object. However, the actual UNet segmentation network was difficult to concurrently consider the large objects (such as the banana fruit string) and small objects (such as the banana fruit stalk, or the irregular edges of banana fruit string), particularly for a single type of receptive field filter. The information of small objects was normally easy to be lost in the down- and up-sampling operations. Therefore, a group of sawtooth wave-like dilated convolution was proposed with the expansion rate of 2, 1, 2 to increase the receptive field for the high sensitivity to data. The banana segmentation dataset consisted of 3000 images, which were divided into 2400, 300 and 300 images in the training set, the verification set, and the test set, respectively. The training strategy was adopted as the dynamic adjustment of learning rate. Once the Loss value did not decrease for 10 epoch times, the learning rate was reduced by 10 times. Meanwhile, the Loss function was designed to combine the Dice Loss and binary cross entropy Loss. Experiments show that the number of the network parameters was 0.45 Million, the recognition and segmentation speed reached 41.0 frame/s, while the mean pixel accuracy and mean intersection over union reached 97.32%, and 92.57%, respectively. Correspondingly, the expansion rate of 2, 1, 2 was selected for the excellent segmentation performance at both edge and stalk of banana fruit. The improved model performed the higher precision and fewer parameters than others. The better balance was achieved between the precision and speed of model. Therefore, the better recognition and response speed were gained in the banana orchard, while the dataset were only a few images. The finding can provide the technology support of visual recognition for the intelligent banana picking robots. The improved model can also be easily transferred to the subsequent applications, such as 3D reconstruction of agricultural targets, 3D positioning of banana fruit, and motion planning of agricultural picking robots.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return