Abstract:
Cropland fragmentation has posed significant challenges on the non-agricultural and non-grain land use in the complex and dynamic agricultural landscape in recent years. Therefore, the accurate plot extraction of cropland is also required for the unclear cropland boundaries in the complex scenarios. In this study, a Multi-Feature Progressive Fusion Unet (MPFUnet) was proposed for the cropland segmentation in the remote sensing images. Both spatial and geometric edge information of cropland was selected for the spatial feature extraction using the Unet as the backbone network. A multi-feature attention module was proposed to enhance the local subtle feature perception rather than the Unet network, including a spatial attention module and an edge feature enhancement module. Among them, the spatial attention module was used to obtain the global features. The average and the maximum pooling maps were firstly connected across channel dimension, Then the spatial attention map was obtained through the activation function, indicating the spatial importance of each pixel. While the edge feature enhancement module was used to improve the perception of multi-scale spatial information, in order to fuse the multiple sets of receptive field features under different dilation rates. A hierarchical heterogeneous fusion strategy was implemented to multiply the attention map and multi-scale feature maps, in order to better learn the multi-dimensional feature map. Thus, the spatial attention features and multi-scale edge information were aggregated at different resolution scales to obtain the edge-enhanced spatial feature maps. A progressive feature enhancement (PFE) structure was also designed to dynamically adapt at different feature scales in the spatial decoding part of the network. The spatial information of adjacent layers was embedded in each layer to further integrate the global semantic information of high-level features and detailed edge information of low-level features. Furthermore, the layer-by-layer integration approach was adopted to capture and fuse the adjacent scale features. The descending order from the low to the high layer of the network was maintained the attention consistency among different feature extraction layers. Multi-source satellite images were utilized in the experiments, such as JL-1, GF-1, GF-2, and GF-7 as the data sources. The 2-meter high-resolution cropland dataset was then collected from the Dongpo District, Meishan City, SiChuan Province, China. The dataset was randomly divided into training, validation, and test sets in a ratio of 3:1:1. The experimental results show that the MPFUnet was achieved in an accuracy of 92.54%, a recall rate of 94.08%, an average intersection over union (IoU) of 84.32%, and a Kappa coefficient of 87.47%, which were improved by 8.23%, 7.01%, 10.02%, and 11.33%, respectively, compared with the baseline Unet model. The excellent performance was also obtained on the regular, fragment and complex scene patches, superior to the rest models, such as DeepLabv3+, YOLOv3, and Swin-Unet. Moreover, the spatial texture and edge features were effectively integrated to realize the crop segmentation within diverse scenarios. Additionally, a robust capacity was also gained to accurately identify the plot boundaries and small areas. Furthermore, there was the high degree of robustness against interference factors in the complex scenarios. Therefore, an effective and viable solution can be offered to extract the farmland fragments in complex scenarios.