基于DBSE-Net的大田稻穗图像分割

宋余庆; 杨东川; 徐立章; 刘哲

doi:10.11975/j.issn.1002-6819.2022.13.023

摘要: 稻穗精准分割是准确估测水稻产量的关键。为实现大田环境下不同品种与生育期稻穗的准确分割，该研究提出了基于注意力机制的稻穗分割网络（Double Branch Squeeze-and-Excitation Network，DBSE-Net）。首先，提出一个双分支压缩与激励（Double Branch Squeeze-and-Excitation，DBSE）注意力模块，通过同时使用全局平均池化（Global Average Pooling，GAP）和全局最大池化（Global Max Pooling，GMP）编码输入特征的通道信息，以实现更精准的通道注意力推断。然后，为了强化稻穗特征并抑制背景区域特征，将DBSE模块添加到编码-解码分割框架中构建DBSE-Net分割网络。最后，在自采集的稻穗图像数据集上进行分割性能测试，DBSE-Net对稻穗分割的像素准确率、平均交并比和F1分数分别达到了94.32%、87.59%和91.86%，比次优模型DeepLabv3+的结果分别高出1.61、2.56和1.20个百分点，在单张256×256（像素）图像上耗时0.03 s，是DeepLabv3+分割速度的5.3倍。在公开的稻穗图像数据集上进行泛化性能测试，DBSE-Net能够有效分割出稻穗区域。该研究结果表明，DBSE-Net能够对不同品种与生育期稻穗实现高效精准分割，具有良好的泛化性，可以为水稻产量评估提供参考。

Abstract: Rice yield is of great significance in crops breeding and cultivation. The automatic measurement of rice yield depends mainly on the rapid and precise segmentation of panicles from the rice images. However, the current models using deep learning present the relatively low accuracy in the panicle segmentation, due primarily to the interference of background information in the process of feature extraction. An attention mechanism can be an effective approach to deal with this problem. The segmentation performance of Convolutional Neural Network (CNN) can be improved to focus on the important features, but suppress the unnecessary ones. The channel attention has been a popular and powerful tool in the computer vision field, because of the simplicity and efficiency. Most commonly-use Global Average Pooling (GAP) can serve as the squeezing the spatial dimension of the input feature map for the higher efficient channel attention mechanisms. However, the GAP cannot well express the input features, where the distinct semantic content is found in the different channels with the same or similar mean values. Consequently, a novel Double Branch Squeeze-and-Excitation (DBSE) attention module was proposed to efficiently and accurately segment the field rice panicle images. The GAP and Global Max Pooling (GMP) were also utilized simultaneously to aggregate the spatial information of feature maps for the channel attention. Moreover, the inter-channel interaction was captured in the DBSE module to further reduce the parameter overhead using one-dimensional convolution rather than the commonly-used fully connected layer. The experimental results demonstrated that the DBSE module was simple yet effective in this case. The DBSE-Net segmentation network was also built using the attention module. As such, a DBSE module was inserted into each encoder and decoder layer of the encoder-decoder segmentation framework, i.e., the ED-Net. Among them, the ED-Net shared an analogous architecture to the SegNet and U-Net. The SegNet presented the efficient storage but a slight loss of accuracy, where the max-pooling indices were employed as the storing boundary for the fewer memory resources, instead of the entire feature maps. The ED-Net was characterized by the manner of feature fusion, compared with U-Net. Specifically, the encoder feature maps were added to the up-sampled decoder ones in the ED-Net, while the U-Net was used to concatenate them. Such a fusion way was halved the channels of input feature maps for each decoder layer, compared with the concatenation. Therefore, the ED-Net presented the smaller parameterization for the less computational overhead, which was contributed to the efficient segmentation for the panicle. A comparison was also made to evaluate the performance of DBSE-Net with the K-means cluster, unsupervised Bayesian, Panicle-SEG, PanicleNet, FCN-8s, PSPNet, and DeepLabv3+. The results showed that the DBSE-Net was achieved the pixel accuracy of 94.38%, the mean intersection over the union of 87.59%, and F1 score of 91.86%, which were 1.61, 2.56, and 1.20 percentage points higher than DeepLabv3+, a suboptimal method, respectively. The network parameters of DBSE-Net were 6.98 million, and the segmentation time was only 0.03s for an image with a resolution of 256×256 pixels. The generalization ability was validated under the extensive experiments on a public imagery dataset of paddy rice. The pixel accuracy, mean intersection over the union, and F1 score for the DBSE-Net were 88.56%, 79.76%, and 78.38%, indicating the competitive performance. Two datasets showed that the DBSE-Net can be expected to efficiently and accurately segment the panicles in the different rice accessions and growth periods, indicating the excellent generalization performance. This finding can serve as a strong reference for the rice yield measurement.

基于DBSE-Net的大田稻穗图像分割

Segmenting field rice panicle images using DBSE-Net