基于网格搜索随机森林算法的工矿复垦区土地利用分类

    Classification of land use in industrial and mining reclamation area based grid-search and random forest classifier

    • 摘要: 为提高工矿复垦区遥感影像土地利用分类精度,为土地复垦监测工作提供数据支持,该文探讨了基于网格搜索(Grid-Search)的随机森林(random forest)复垦区土地利用分类方法。研究利用GF-1影像、DEM(digital elevation model)和野外调查等数据,以随机森林分类算法为框架,采用基于OOB(Out-of-Bag)误差的网格搜索法对算法进行参数寻优,结合影像光谱、地形、纹理、空间信息,计算选取了33个特征变量,构建了4种变量组合模型开展随机森林分类试验,4个组合模型的分类精度分别达到82.79%、84.91%、86.75%、88.16%。为去除33个特征变量中的冗余信息、降低影像波段变量维度、缩短分类执行时间并保证影像分类精度,试验分别利用变量重要性估计和Relief F方法进行特征选择后再次执行随机森林分类,将分类结果与不同组合模型、不同分类方法进行比较,结果表明:基于网格搜索参数寻优的随机森林算法在多特征变量的影像分类中可以达到88.16%的分类精度,在利用不同方法降维后依然可以将分类精度保持在85%以上,精度优于相同特征变量下的SVM(support vector machine)和MLC(maximum likelihood classification)分类方法;在效率方面,随机森林分类方法执行时间优于SVM,并且在处理多维特征变量时能力更强。由此可见,采用基于网格搜索的随机森林方法对工矿复垦区土地利用信息进行分类提取可以得到较高的精度,基于该方法开展遥感影像解译可为土地复垦监测工作提供技术支持和理论参考。

       

      Abstract: Abstract: In the industrial and mining land reclamation area, the strong topographic relief, the diversity, breakage, mixed distribution and scattered layout of the surface features and other factors cause the difficulties for remote-sensing image classification mapping. In order to improve the classification accuracy for land use of industrial and mining reclamation area and provide data support for land reclamation monitoring and supervision, this article explored the classification method based on grid-search and random forest algorithm for the reclamation area. Satellite and auxiliary dataset including GF-1 images, DEM (digital elevation model) and field investigation data were acquired in October 2016. The study area was Gulin County, Luzhou City, Sichuan Province. In order to obtain the real surface reflectance and reduce the atmospheric and environmental effects from the satellite images in this study, FLAASH atmospheric correction and geometric correction were used in the satellite image pre-processing with ENVI 5.3 software. A machine learning algorithm, random forest algorithm, was used because the method facilitated the use of ancillary data in classification. Feature selection was an important preprocessing step in many machine learning applications, which selected the smallest subset of relevant features that built robust learning models. In the paper, spectrum, topography, texture and space variables were included in feature selection, in order to differentiate the built-up areas and farmlands, and BCI (biophysical composition index) was calculated in spectrum features. Texture feature processing comprised principal component analysis. Local Moran' I reflecting spatial autocorrelation feature and Local Getis Ord Gi reflecting hotspot feature were selected to improve the result of classification further. The grid-search method based on OOB (Out-of-Bag) error was used to optimize parameter. Based on data image spectrum, topography, texture, space and other information, 33 feature variables were figured out from the feature selection step, and 4 combined models were constructed to carry out random forest classification experiment; and the precision was 82.79%, 84.91%, 86.75% and 88.16% respectively. To eliminate the redundant information in the 33 feature variables and reduce the image band dimensionality, the study adopted variable importance estimation and Relief F algorithm to select the principle feature variables to conduct classification according to random forest algorithm. Through the comparison between the Model 2, Model 4, SVM (support vector machine) and MLC (maximum likelihood classification) classification result respectively, the study indicates that the random forest algorithm based on grid-search parameter optimization can achieve the classification accuracy of 88.16% in the multi-feature variables frame. After different methods are used to reduce the dimension of variables, the classification accuracy can also be kept above 85%, and the accuracy is higher than SVM and MLC classification results under the same number of feature variables. The random forest classifier is superior to SVM and more capable of dealing with multidimensional characteristic variables. The random forest method based on grid-search can obtain high precision in land use classification applied in reclamation area. Based on this method, remote sensing image interpretation can well provide the technical support and rational reference for land reclamation monitoring and supervision.

       

    /

    返回文章
    返回