王李娟, 孔钰如, 杨小冬, 徐艺, 梁亮, 王树果. 基于特征优选随机森林算法的农耕区土地利用分类[J]. 农业工程学报, 2020, 36(4): 244-250. DOI: 10.11975/j.issn.1002-6819.2020.04.029
    引用本文: 王李娟, 孔钰如, 杨小冬, 徐艺, 梁亮, 王树果. 基于特征优选随机森林算法的农耕区土地利用分类[J]. 农业工程学报, 2020, 36(4): 244-250. DOI: 10.11975/j.issn.1002-6819.2020.04.029
    Wang Lijuan, Kong Yuru, Yang Xiaodong, Xu Yi, Liang Liang, Wang Shuguo. Classification of land use in farming areas based on feature optimization random forest algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(4): 244-250. DOI: 10.11975/j.issn.1002-6819.2020.04.029
    Citation: Wang Lijuan, Kong Yuru, Yang Xiaodong, Xu Yi, Liang Liang, Wang Shuguo. Classification of land use in farming areas based on feature optimization random forest algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(4): 244-250. DOI: 10.11975/j.issn.1002-6819.2020.04.029

    基于特征优选随机森林算法的农耕区土地利用分类

    Classification of land use in farming areas based on feature optimization random forest algorithm

    • 摘要: 为了提高农耕区土地利用分类精度,采用较高空间分辨率和丰富光谱信息的Sentinel-2数据生成光谱特征、无红边波段的植被指数、红边指数和纹理特征4种基本特征变量,并对以上特征变量优选后进行特征重要性排序,进而构建7种特征组合方案,基于随机森林算法和支持向量机对农耕区土地利用信息进行提取并对比验证分类精度。研究结果表明:通过特征优选的随机森林算法进行土地利用信息提取效果最佳,总体精度达到88.24%,Kappa系数为0.84,精度优于相同特征变量下的支持向量机分类方法。该方法能够有效提高农耕区土地利用分类精度,可为土地资源监测、管理提供技术支持和理论参考。

       

      Abstract: Abstract: Classification of land use plays an important role in many aspects such as dynamic monitoring, planning, and management, rational land development and protection. At present, with the gradual acceleration of urbanization in China, the area of construction land is increasing and that of cultivated land is decreasing instead. As a result, it is of great significance to obtain the land use classification information of farming areas accurately and timely for the rational planning of agricultural land resources. In recent years, machine learning algorithms have been widely used in the research of land use classification. Among them, the random forest algorithm (RF) has the characteristics of high classification accuracy, strong ability to deal with multi-dimensional data variables, fast training, and prediction speed. And it is widely used in the research of land use classification. However, the participation of multiple feature variables in the classification will lead to information redundancy, over-fitting of the RF and classification accuracy reduction. Therefore, this study used Sentinel-2 data with high spatial resolution and abundant spectral information and used the RF based on feature optimization to carry out land use classification research in agricultural areas. First, Sentinel-2 data was used to generate four basic feature variables, which were spectral features, vegetation indices without the red-edge band, red-edge indices and texture features. Then, the spectral features were screened by the optimum index factor (OIF), vegetation indices and texture features were both selected by the method of the principal component analysis. After that, the method of mean decrease in accuracy (MDA) was applied to evaluate the importance of the above feature variables, and six feature combination schemes were constructed, which were combined with field survey data for RF classification. Finally, by comparing the accuracy of six different combination schemes, the best combination of feature variables was selected. And the classification results of the RF and support vector machine (SVM) of the best combination were compared to verify the practicability of RF in agricultural land use classification. The results were as follows: (1) To avoid the degradation of classification performance caused by "curse of dimensionality", this study used OIF and principal component analysis to optimize the features. The results showed that this method was effective and significantly improved the classification accuracy of land use types in agricultural areas; (2) The four basic feature variables were sorted by feature importance, indicating that the importance of different features was as follows: red-edge indices > vegetation indices without red-edge band > spectral features > texture features; (3) The comparison of the classification results of 7 experimental schemes revealed that by adding vegetation indices, texture features, and other information, the classification accuracy of land use could be effectively improved. Besides, based on feature optimization, the RF algorithm had the highest classification accuracy, and the overall accuracy was 88.24%, kappa coefficient was 0.84, which was better than SVM classification results under the same feature variables. In a word, the RF based on feature optimization which was proposed in this study provided a new method to effectively improve the accuracy of land use classification in farming areas, and technical support and theoretical reference for land resource monitoring and management.

       

    /

    返回文章
    返回