基于改进UNet++的高分二号影像苹果种植区提取

    Extracting apple planting area from GF-2 satellite imagery using an improved UNet++

    • 摘要: 准确获取苹果种植空间分布信息对于优化生产管理和产业布局调整具有重要意义。针对高分辨率遥感影像中苹果园与其他作物光谱混淆及空间分布破碎化导致的分类精度不高和边界模糊等问题,该研究提出一种基于改进UNet++的苹果园语义分割模型MSDAW-UNet++。首先,对高分二号影像进行预处理并构建苹果种植区提取数据集;其次,在UNet++模型的关键特征融合节点处设计了多尺度双注意力(multi-scale dual attention module,MSDA)模块,增强苹果种植区的多尺度上下文信息和空间信息;同时,在UNet++各层首个特征融合节点处,将原有密集嵌套跳跃连接中的普通卷积替换为小波变换卷积(WTConv),构建了小波嵌套融合(wavelet-nested fusion block,WNFB)模块,通过引入遥感影像频域特征提升光谱相似地物的区分能力。试验结果表明,多尺度双注意力与小波嵌套融合模块结合,通过频域-空域特征协同增加了苹果园识别能力;MSDAW-UNet++模型能够有效提升苹果种植区的提取精度,其F1值和IoU分别达到96.63%和90.46%,与原始的UNet++模型相比分别提升了3.87和10.07个百分点,与其他主流模型相比,F1值和IoU分别提升2.35~9.55、6.67~19.54个百分点。该研究成果可为基于高分遥感影像的果园精细化提取提供参考。

       

      Abstract: Accurate extraction of apple planting areas from the high-resolution remote sensing images is often required to optimize the production and industrial layout. This study aims to solve the technical problems with the extraction of apple planting areas from the high-resolution remote sensing images, including spatial fragmentation, spectral confusion, and boundary blurring. To this end, an enhanced MSDAW-UNet++ model was proposed using the UNet++ architecture. A multi-scale dual attention (MSDA) module was incorporated at the key feature fusion nodes, in order to enhance the multi-scale contextual and spatial information of apple planting areas; Meanwhile, a wavelet nested fusion (WNFB) module was embedded at the first feature fusion node of each layer in UNet++. A systematic preprocessing was performed on the GF-2 satellite images during data preparation. A dataset was then constructed to extract the apple planting areas. The MSDA module was then integrated with the multi-scale feature extraction, the multi-head self-attention (MHSA), and positional attention (PSA) mechanisms. Four scales of feature representation were firstly obtained using depthwise separable convolutions. Then, this multi-scale information was input into the MHSA and PSA, respectively. The MHSA mechanism was used to construct the long-range dependencies between apple planting areas in the different regions and combined local and global information by the correlations among input sequence elements. After that, the overall structure of the apple orchard was effectively analyzed after calculation. The conventional feed-forward neural network (FFN) was replaced with an enhanced E-FFN. More efficient feature interaction and multi-scale learning were achieved at the lower computational cost. Furthermore, the local perception of a convolutional neural network (CNN) was integrated with the global modelling strength of transformers, in order to enhance the accuracy and efficiency of apple plantation extraction. The PSA was generated the location-aware attention maps using feature interaction, and then explicitly modeled the geometric constraints among pixels. Continuous energy responses were obtained in the edge region. Spatial continuity was captured to reinforce the correlation between long-distance pixels for the smooth transitions between adjacent pixels. The local features were preserved to prevent the spatial disconnection that caused by environmental complexity. Ultimately, the boundary consistency and regional integrity were improved after semantic segmentation. Finally, the PSA and MHSA mechanisms were combined to produce the output features with the multi-scale contextual and spatial information. The apple planting area was extracted in a complex planting environment. In addition to the MSDA module, the wavelet nested fusion (WNFB) module was specifically designed to combine the wavelet transform convolution (WTConv). The conventional convolution kernels were replaced with the wavelet transform convolution to optimize the semantic segmentation using frequency-space domain synergistic feature extraction. The better performance was obtained to differentiate between spectrally similar features. Experimental test showed that the MSDAW-UNet++ model performed best to extract the apple planting areas, with an F1-score of 96.63% and an IoUz of 90.46%. Compared with the UNet++ benchmark model, the improved model was achieved in the absolute improvements of 3.87 percentage points in the F1-score and 10.07 percentage points in the IoU value, respectively. Compared with classic semantic segmentation models (FCN, UNet and DeepLab v3+), current mainstream remote sensing semantic segmentation models (MCSNet, CMLFormer, and CMTFNet), and UNet derivative models (MAResU-Net, CM-UNet, and UNet3+), the F1-score was improved by 2.35-9.55 percentage points and the IoU by 6.67-19.54 percentage points. Ablation experiments were used to analyze the effectiveness of the multi-scale dual-attention and wavelet nested fusion modules. The MSDA and WNFB modules were effectively extracted the multi-scale contextual and spatial information, as well as frequency-domain features of the apple planting area. A more comprehensive feature expression can be provided for the fine extraction of the apple planting area in a complex planting environment. The findings can offer a valuable reference for the fine extraction from the orchard images using high-resolution remote sensing.

       

    /

    返回文章
    返回