基于多尺度串联空洞卷积的轻量化UNet香蕉图像分割

朱立学; 伍荣达; 付根平; 张世昂; 杨尘宇; 陈天赐; 黄沛琛

doi:10.11975/j.issn.1002-6819.2022.13.022

摘要: 针对香蕉果串识别系统中传统的UNet网络存在实时性差、参数量多、下采样后丢失空间信息等问题，该研究提出基于UNet模型的轻量化分割网络，构造一个轻量级的主干特征提取模块，在降低模型参数量和计算量的同时增强网络提取特征的能力，使用膨胀率为2,1,2锯齿波形的多尺度串联空洞卷积组合在增大感受野的同时保持对细节的敏感度。该研究算法在自建香蕉果串数据集上的试验结果表明，网络参数量为0.45 M时，香蕉果串识别分割速度可达41.0帧/s，平均像素分类准确率为97.32%、交并比为92.57%。相比于其他模型具有准确率高、参数量小等优点，能够较好地实现精度和速度的均衡。该算法对自然种植环境下的香蕉果串具有良好的识别效果，可为智能化香蕉采摘等应用提供视觉识别技术支持。

Abstract: An efficient and accurate fruit detection has been widely used as one of the most important tasks in the field operation of agricultural robots. However, a great challenge is still remained on the complex background and the uncertainty of orchard environment, due to the similarity or occlusion between fruit and branches. The traditional UNet network cannot fully meet the banana recognition and picking system, such as the low real-time performance, the large number of parameters, and spatial information loss after down sampling. In this study, a lightweight UNet was proposed for the banana image segmentation using multi-scale serial dilated convolution. The coding layer of UNet network was firstly used to visualize the feature map. The UNet network was utilized to extract the similar features for many times, in order to better identify the edge texture of the targets, and color features. Therefore, the lightweight backbone of feature module was constructed as the Concat Depth Wise Block (CDWBlock). The effective feature map was obtained in the module using cheaper operation. As such, the number of model parameters and computation were reduced significantly, without losing feature extraction ability of the network. There were also some specific characteristics in a banana orchard, including the complex background, the large cluster of banana fruit, the small stalk of banana fruit, and the color low contrast. In the neural network, the filter with the wide receptive field was easier to identify the large object, while the narrow was easier to identify the small object. However, the actual UNet segmentation network was difficult to concurrently consider the large objects (such as the banana fruit string) and small objects (such as the banana fruit stalk, or the irregular edges of banana fruit string), particularly for a single type of receptive field filter. The information of small objects was normally easy to be lost in the down- and up-sampling operations. Therefore, a group of sawtooth wave-like dilated convolution was proposed with the expansion rate of 2, 1, 2 to increase the receptive field for the high sensitivity to data. The banana segmentation dataset consisted of 3000 images, which were divided into 2400, 300 and 300 images in the training set, the verification set, and the test set, respectively. The training strategy was adopted as the dynamic adjustment of learning rate. Once the Loss value did not decrease for 10 epoch times, the learning rate was reduced by 10 times. Meanwhile, the Loss function was designed to combine the Dice Loss and binary cross entropy Loss. Experiments show that the number of the network parameters was 0.45 Million, the recognition and segmentation speed reached 41.0 frame/s, while the mean pixel accuracy and mean intersection over union reached 97.32%, and 92.57%, respectively. Correspondingly, the expansion rate of 2, 1, 2 was selected for the excellent segmentation performance at both edge and stalk of banana fruit. The improved model performed the higher precision and fewer parameters than others. The better balance was achieved between the precision and speed of model. Therefore, the better recognition and response speed were gained in the banana orchard, while the dataset were only a few images. The finding can provide the technology support of visual recognition for the intelligent banana picking robots. The improved model can also be easily transferred to the subsequent applications, such as 3D reconstruction of agricultural targets, 3D positioning of banana fruit, and motion planning of agricultural picking robots.

基于多尺度串联空洞卷积的轻量化UNet香蕉图像分割

Segmenting banana images using the lightweight UNet of multi-scale serial dilated convolution