Abstract:
Abstract: High resolution remote sensing images (HRSI) provide abundant information on the textures and terrain structures of a scene. In recent years, scene classification methods based on mid-level feature learning have been increasingly used for the scene-level land use classification with high resolution remote sensing images. However, it is always a challenging task for effectively organizing and optimizing the spectral, texture and geometrical structure features in the field of land use classification at the scene level. Since the learning algorithm based on mid-level features can represent the low-level features (e.g., spectrum, textures and geometrical structures) of HRSI effectively, the scene level classification of land use can be easily achieved by the use of a classifier like support vector machine (SVM). Nevertheless, the mid-level feature descriptors are not discriminative enough, because the mid-level feature descriptors are learned by an unsupervised way. Meanwhile, the conventional approaches using this strategy consider merely the geometrical structure features, and neglect other meaningful low-level features of the images. In order to make the learned feature descriptors more discriminative and incorporate different low-level features better, in this work we proposed a method utilizing the vector-cascading model combining multi-features soft-probability to achieve the land use classification at the scene-level. Firstly, the local dense scale invariant feature transform (DSIFT), spectral features (SF) and local binary pattern (LBP) features were extracted as the low-level features of the images. The spectral features were obtained by calculating the color histogram of the images. Then, with regard to each type of low-level features, from each image a certain number of samples were selected randomly to be clustered by K-means algorithm to generate the dictionary. Secondly, based on the trained dictionary of the different features, the local DSIFT, spectral and LBP features were encoded individually with the locality-constraint linear coding (LLC) to get the sparse coefficients, with spatial pyramidal matching (SPM) model and the max-pooling used to obtain the mid-level feature descriptors. Finally, the mid-level feature descriptors of the three different low-level features were classified respectively by SVM classifier, and then the three different features soft-probabilities were calculated. After that, these feature soft-probabilities were vector-cascaded as the final feature representation of the image, and a second round of classification employing SVM classifier is then conducted for the final classification result. We validated our proposed method via the experiments using the public UC-Merced Land Use datasets. It can be concluded from experimental results that: 1) The overall accuracy of our proposed method reached to 88.6%, comparing with the traditional classification methods (i.e., ScSPM and LLC), the classification accuracy had been improved by 12.7% and 9.9% respectively; 2) By adjusting the size of dictionary and the number of training images, the classification results were proved to be more sensitive to the number of training images rather than the dictionary size. The average increase of classification accuracy was approximately 25.0% when the number of training images was increased to 60; 3) In contrast to the other scene classification methods which extracted the single low-level features, the proposed algorithm could more efficiently classify the indistinguishable land use types such as dense residential and medium residential, and it also could improve the accuracy of scene-level classification of land use considerably, with HRSI used.