基于改进空间-协调注意力UNet的多作物分类提取

冯祥; 张学林; 王建雄

doi:10.11975/j.issn.1002-6819.202305173

基于改进空间-协调注意力UNet的多作物分类提取

Multi crop classification extraction based on improved spatial-coordinate attention UNet

摘要

摘要: 为弥补目前多作物分类提取精细化程度不高的问题，探究不同尺度数据集对网络精度的影响，该研究对协调注意力进行改进，并将此模块加入到UNet网络中，以验证改进后的空间-协调注意力UNet（spatial-coordinate attention UNet，SPCA-UNet）的合理性与有效性。结果表明：以1500×1500像素分辨率数据为输入的网络提取精度最高，UNet和DeepLab v3+网络模型对尺度信息不敏感；在注意力比较试验中，改进的空间-协调注意力的平均交并比、平均像素精度、平均精准率、平均召回率均优于SENet（squeeze-and-excitation networks）、CBAM（convolutional block attention module）、ECA（efficient channel attention）和CA（coordinate attention）模块，平均交并比达到了92.20%，平均像素精度达到95.97%，比CA模块的平均交并比和平均像素精度分别高出1.16和0.76个百分点。改进的空间-协调注意力可以很好地保持作物边界信息，由于其较强的规范性和约束性，孤岛现象不明显，不容易出现错分漏分现象。在多作物数据集上，对空间位置信息的关注所带来的精度收益更高，编码器-解码器结构的多特征层拼接融合信息的方式对于多作物提取更为有效，在UNet、PSPNet、DeepLab v3+和SPCA-UNet网络模型中，基于改进的空间-协调注意力的UNet网络获得了最好的效果，平均交并比相比其他3种网络平均交并比最高的UNet高出1.3个百分点。该研究成果可为多作物的精细化分类提取提供参考依据。

Abstract: Semantic segmentation has been one of the most powerful image-processing technologies for the more complex agricultural environment in the field. In this study, a UNet network model with improved spatial-coordinate attention (SPCA) was proposed to promote the efficiency of the refined network model for the multi-crop semantic segmentation. A new SPCA system was also designed to combine the spatial mechanism and coordinate attention (CA). Specifically, the maximum and average numbers were first determined among all the feature points on each feature layer in this same direction. The feature channels were stacked and adjusted with the maximum and average numbers. Another weight feature layer was then activated with the spatial feature points. Secondly, the location information was extracted along the two directions using a CA module after multiplying the feature layer by the main road. At the same time, the coefficient “r” was removed to retain the feature channel information without any extra calculation cost of the channel. Finally, the SPCA module was blended into the UNet model to verify the semantic segmentation, compared with the rest attention modules. The results show that there was an outstanding difference in the segmentation accuracy for the three semantic segmentation models at the different scales. Firstly, the figure of accuracy was much lower at the scale of 512×512 for the three models. The number of accuracies was about four percentage points higher at the scale of 1500×1500 for the three networks than those at the other two scales. The optimal combination was achieved in the UNet network model. Secondly, the highest accuracy was observed in the SPCA among the five modules. The mean intersection over union (MIoU), mean pixel accuracy (MPA), mean precision (MPrecision), and mean recall (MRecall) were all ahead of the four modules of squeeze-and-excitation networks (SENet), convolutional block attention module (CBAM), efficient channel attention (ECA), and CA, with the MIoU of 92.20%, the MPA of 95.97%, 1.16 and 0.76 percentage points higher than the highest precision of the MIoU and MPA of the four modules, respectively. The CA module ranked second after the SPCA, and the least was the ECA module. Furthermore, there was a very low accuracy gain before and after the replacement for the crop segmentation, because the ECA's 1D convolution was inferior to the full connection layer. The CBAM with spatial attention was added to focus on channel attention, indicating better segmentation accuracy. The CA also shared much better location information, compared with the spatial. Thirdly, the highest accuracy was found in the segmentation of flue-cured tobacco, the building was the second, and the corn was the last, when the crop was classified using the five modules. The intersection over union (IoU) and pixel accuracy (PA) in the SPCA were both superior to those in the other four. Relatively, the SPCA was beneficial in retaining the crop boundary information, clearly integral splitting picture, and fuzzy island. These mistakes were then removed to identify and classify the crop, due to the strong Robust Performance. Lastly, the four indexes of accuracy evaluation were performed the best in the UNet with SPCA. The highest accuracy of the MIoU was 1.3 percentage points higher than the rest. The UNet precision was ranked second. The lowest accuracy of the four evaluation indicators was found in the PSPNet. The SPCA-UNet was the commonly highest on the various accuracy levels of segmentation, each of which remained above about 93%. DeepLab v3+ had the lowest accuracy in the flue-cured tobacco. The lowest accuracy was PSPNet in the other three categories. Therefore, there was the most important information gained by the connection of multi-feature layers between the encoder and the decoder for the task of extracting multiple crops. This finding can provide a strong reference to extract the accuracy categories in the multiple crops, due to the excellent performance of the UNet using improved spatial-coordinate attention. Since the training scenario can be involved in the mobile terminal, the mobile platform can be expected to reduce the training time without significant loss of accuracy.

HTML全文

参考文献(35)

施引文献

资源附件(0)