Abstract:
Abstract: Land Use/Land Cover type automatic interpretation based on remote sensing data is one of the key problems in many relevant fields. Although a large number of image classification algorithms have been developed, most of them can hardly meet the application requirements. Probabilistic topic models, represented by Latent Dirichlet Allocation (LDA) model, have showed a great success in the field of natural language processing and image processing, which can be used to effectively overcome the gap between low-level features and high-level semantic. In recent years it have also been introduced into remote sensing image analysis field, while most of the researches focused on the analysis of high-resolution remote sensing images. Nonetheless, the moderate-resolution remote sensing data is one of the main sources in Land Use/Land Cover type automatic interpretation. The study analyzed the problem faced by traditional probabilistic topic models in reduced resolution remote sensing image analyzing, and pointed out that low segmentation scale made the image objects small and contained fewer pixels. In fact the objects, which are regarded as image documents in current work, are sparse in moderate resolution remote sensing image. The scarcity led to poor stability when using the standard LDA model to infer the semantic of short documents. So Biterm Topic Model (BTM) showed the ability of inferring the semantic of sparse documents. BTM learns topics by directly modeling the generation of word co-occurrence patterns in the corpus, making the inference effective with the rich corpus-level information. By segmenting the remote sensing image into two scales and regarding the image objects at two levels as short documents and visual words respectively, BTM was introduced to the classification of moderate resolution remote sensing image. The co-occurrence of words denoted as biterm in a document were modeled in BTM extracted by setting a short context refers to a small, fixed-size window over a term sequence. However, the sequence pattern of image visual words is different from that in text. While the spatial relationship is the most important relationship among the visual words, the spatial neighborhood visual words can express the law of image of a certain land use/land cover type and is more aligned with the principle of the humans' observation process. So, it was proposed to use space adjacent visual word pairs as the observations in BTM called S-BTM to reduce the quantity of observation objects. Similar to LDA, it was intractable to exactly solve the parameters, Gibbs sampling was used to infer the topic of visual words. Advance Land Observing Satellite (ALOS) images were used in the experiment, whose spatial resolution was 10 m with 4 bands. LDA, BTM and S-BTM were compared in the classification of land use types. Both BTM and SBTM had higher classification accuracy than LDA. BTM and S-BTM reached the highest accuracy respectively when the visual dictionary size was 480 and 400. When the visual dictionary size was fixed at 400, S-BTM was more effective than BTM at different topic size and both reached the highest accuracy with 20 topics. S-BTM used 33 562 biterms to infer the image documents' topic while the number in BTM was 167 455, which showed that S-BTM needed less computation. At last, when topic size was fixed at 20, both overall classification accuracy and Kappa's Coefficient showed that S-BTM achieved better results than LDA and BTM.