Rice extraction from multi-source remote sensing images based on HRNet and self-attention mechanism
-
Graphical Abstract
-
Abstract
Rice is one of the most important cash crops in the world. Rice classification and mapping are fundamental to assessing food security. The multi-source satellite imagery can be expected to obtain the rice distribution with the advances in remote sensing. A hot research topic is to effectively synthesize data from multiple sensors, in order to improve the accuracy of crop extraction. In particular, the optical and radar data can be combined to classify the rice in the context of cloudy and rainy regions. Previous studies on rice extraction from synthetic aperture radar (SAR) and optical images have achieved good results. However, simple stacking was usually used in feature fusion without considering the interaction between optical and SAR images. Moreover, mostly shallow features, such as texture and geometric features,were extracted and used for classification,rather than high-level semantic features. In this study, a multi-level feature fusion framework was proposed to improve the existing network of deep learning (HRNet). The improved model obtained was named MSATT-HRNet. The dual-polarized SAR imagery of Sentinel-1 and the multispectral optical imagery of Sentinel-2 from the ESA Copernicus project were integrated to extract the rice cultivation areas in Wangcheng District, Changsha City, Hunan Province. The improvements to HRNet mainly included two parts: 1) The convolution group composed of the channel attention mechanism and the maximum pooling was designed to extract the SAR features.The self-attention module was embedded in the basic module of feature extraction in HRNet for extracting features of multispectral optical images. 2) A feature fusion module was designed with the channel and spatial attention, in order to explore the intrinsic complementary relationship between the dual-modal features. Ablation experiments were carried out to verify the improved model. The performance of the MSATT-HRNet network was evaluated to compare the rice extraction with the common deep learning (MCANet, Deeplabv3, and Unet), and the original HRNet with only optical images before improvement. The results showed that a significantly higher overall accuracy was achieved in the rice mapping scheme embedded with the improved feature extraction and fusion module. The multi-source data fusion greatly contributed to balancing various data sources. The overall accuracy and Kappa coefficient of the MSATT-HRNet model were as high as 97.04% and 0.961, respectively. Compared with the MCANet, Deeplabv3, Unet, and original HRNet, the overall accuracy of the model was higher by 6.90, 2.67, 2.98, and 4.85 percentage points, respectively, while the Kappa coefficient was higher by 0.055,0.025,0.030 and 0.041, respectively. The comparison confirmed that the improved model can effectively improve the accuracy of rice extraction. A feasible option was offered to couple deep learning techniques and remote sensing images for the rice mapping in the southern cloudy and rainy region. Nevertheless, some limitations still remain in the application of the improved model due to the availability of optical images of the area.
-
-