基于HRNet和自注意力机制的多源遥感影像水稻提取

    Rice extraction from multi-source remote sensing images based on HRNet and self-attention mechanism

    • 摘要: 为了提高协同光学和雷达影像提取水稻的精度,该研究通过改进深度学习网络HRNet,提出一种多级特征融合的框架方法,改进后的MSATT-HRNet模型综合利用欧空局哥白尼项目哨兵1号(Sentinel-1)的双极化合成孔径雷达(synthetic aperture radar,SAR)影像和哨兵2号(Sentinel-2)多光谱光学影像,提取了湖南省长沙市望城区水稻种植区域。针对HRNet网络的改进主要包括两部分:1)设计了通道注意力机制与最大池化组成的卷积组用于提取SAR特征,同时将自注意力模块嵌入HRNet基础特征提取模块中用于提取多光谱光学影像的特征;2)为了探索双模态特征之间的内在互补关系,设计了由通道注意力与空间注意力组成的特征融合模块。研究针对改进模型进行了消融试验,并将MSATT-HRNet与其他常用深度学习方法(MCANet、Deeplabv3、Unet)进行了比较。结果表明,该研究提出的多源数据融合方法能够利用不同数据源的互补优势。水稻种植区域提取结果的总体精度、Kappa系数分别达到97.04%和0.961,与MCANet、Deeplabv3、Unet相比,总体精度分别提高6.90、2.67和2.98个百分点,Kappa系数分别提高0.055、0.025和0.030。证实了该方法可以有效提高水稻的判别精度。研究通过深度学习技术与遥感影像的耦合,为南方多云雨地区水稻制图提供了一种可行的选择。

       

      Abstract: Rice is one of the most important cash crops in the world. Rice classification and mapping are fundamental to assessing food security. The multi-source satellite imagery can be expected to obtain the rice distribution with the advances in remote sensing. A hot research topic is to effectively synthesize data from multiple sensors, in order to improve the accuracy of crop extraction. In particular, the optical and radar data can be combined to classify the rice in the context of cloudy and rainy regions. Previous studies on rice extraction from synthetic aperture radar (SAR) and optical images have achieved good results. However, simple stacking was usually used in feature fusion without considering the interaction between optical and SAR images. Moreover, mostly shallow features, such as texture and geometric features,were extracted and used for classification,rather than high-level semantic features. In this study, a multi-level feature fusion framework was proposed to improve the existing network of deep learning (HRNet). The improved model obtained was named MSATT-HRNet. The dual-polarized SAR imagery of Sentinel-1 and the multispectral optical imagery of Sentinel-2 from the ESA Copernicus project were integrated to extract the rice cultivation areas in Wangcheng District, Changsha City, Hunan Province. The improvements to HRNet mainly included two parts: 1) The convolution group composed of the channel attention mechanism and the maximum pooling was designed to extract the SAR features.The self-attention module was embedded in the basic module of feature extraction in HRNet for extracting features of multispectral optical images. 2) A feature fusion module was designed with the channel and spatial attention, in order to explore the intrinsic complementary relationship between the dual-modal features. Ablation experiments were carried out to verify the improved model. The performance of the MSATT-HRNet network was evaluated to compare the rice extraction with the common deep learning (MCANet, Deeplabv3, and Unet), and the original HRNet with only optical images before improvement. The results showed that a significantly higher overall accuracy was achieved in the rice mapping scheme embedded with the improved feature extraction and fusion module. The multi-source data fusion greatly contributed to balancing various data sources. The overall accuracy and Kappa coefficient of the MSATT-HRNet model were as high as 97.04% and 0.961, respectively. Compared with the MCANet, Deeplabv3, Unet, and original HRNet, the overall accuracy of the model was higher by 6.90, 2.67, 2.98, and 4.85 percentage points, respectively, while the Kappa coefficient was higher by 0.055,0.025,0.030 and 0.041, respectively. The comparison confirmed that the improved model can effectively improve the accuracy of rice extraction. A feasible option was offered to couple deep learning techniques and remote sensing images for the rice mapping in the southern cloudy and rainy region. Nevertheless, some limitations still remain in the application of the improved model due to the availability of optical images of the area.

       

    /

    返回文章
    返回