Abstract:
Abstract: Semantic segmentation of an image has become a key interdisciplinary application in image processing, computer vision, pattern recognition, and artificial intelligence. In deep learning architectures, the Convolutional Neural Network for Interferometric Semantic Segmentation (CNN-ISS) is widely used in digital image processing and machine vision. The CNN-ISS can be utilized to effectively extract further features, such as texture and geometric features, indicating stronger transfer learning and generalization, compared with traditional image classifications of remote sensing. As such, the CNN-ISS is suitable for the interpretation of high-resolution remote sensing images, identification of complicated features, and crop mapping. In classification, large remote sensing images need to be segmented into specific tiled images, thereby serving as the object of Convolutional Neural Network (CNN) processing. However, artificial image tiling can generate fragments on the edge of a tile, leading to the low classification accuracy of pixels near the edge of the tile. Here, the phenomenon was defined as the edge effect of tiled images, where the classification accuracy of pixels near the edge of the tile was lower than that of the central area. In this work, two indicators were designed, including the error rate with a distance to tile edges (ERD), and the error rate of the whole image (ERW), to quantify the edge effect of CNN-ISS processed tiled images. Meanwhile, the offset positions (i, k) were set for the starting point of the shift window to ensure that any pixel on the whole image must be in the central area of the tile generated under a certain offset setting. Then, five technical solutions were obtained to test the minimized edge effect of tiled images using the scores in multiple groups of categories. Taking the Tangshan as the segmented typical rural surface, a DeepLab V3 was selected as the core model of CNN-ISS to analyze the edge effect of the classification. The results showed that the pixel classification accuracy was positively correlated with the distance from the pixel to the edge of a tiled image. The highest error rate was 6.93% occurred along the edge of the tiled image, and the lowest error rate was 3.52% in the center of the tile, indicating the accuracy of the central area was higher than that of the edge. It showed an obvious edge effect of tiled images. In edge effect elimination scheme for the tiled images, the total classification accuracy improved significantly, where the Kappa coefficient and Mean Intersection over Union (mIoU) of the entire image increased 0.012 2 and 1.97 percent point, respectively. Taking the Kappa coefficient, one of the classic accuracy indices for the remote sensing image interpretation, as an example, the order of accuracy including the control group was: solution 2 (0.881 0)> solution 5 (0.878 9) > solution 3 (0.878 8) > solution 4 (0.877 7) > solution 1 (0.875 9) > the control group (0.868 8). Besides, the solutions of edge effects depended mainly on the types of features in the tiled images. The general law was that the tile edge effects of linear features and complex isomers (pit ponds, rural residential areas) were more obviously improved the accuracy, as the solutions were more significantly accurate, compared with that of the base land, or another agricultural land. Compared with the control group, the improvement order of IoU in the solution 2 was: roads (4.13 percent point) > pit ponds (2.97 percent point) > rivers and ditches (1.61 percent point) > rural residential areas (0.65 percent point) > another agricultural land (0.46 percent point). Without changing the core model of CNN semantic segmentation, the elimination scheme for the edge effect of a tile can be used to effectively improve the accuracy of remote sensing image classification, especially for the linear features and complex isomers.