Classification of land use in farming areas based on feature optimization random forest algorithm
-
-
Abstract
Abstract: Classification of land use plays an important role in many aspects such as dynamic monitoring, planning, and management, rational land development and protection. At present, with the gradual acceleration of urbanization in China, the area of construction land is increasing and that of cultivated land is decreasing instead. As a result, it is of great significance to obtain the land use classification information of farming areas accurately and timely for the rational planning of agricultural land resources. In recent years, machine learning algorithms have been widely used in the research of land use classification. Among them, the random forest algorithm (RF) has the characteristics of high classification accuracy, strong ability to deal with multi-dimensional data variables, fast training, and prediction speed. And it is widely used in the research of land use classification. However, the participation of multiple feature variables in the classification will lead to information redundancy, over-fitting of the RF and classification accuracy reduction. Therefore, this study used Sentinel-2 data with high spatial resolution and abundant spectral information and used the RF based on feature optimization to carry out land use classification research in agricultural areas. First, Sentinel-2 data was used to generate four basic feature variables, which were spectral features, vegetation indices without the red-edge band, red-edge indices and texture features. Then, the spectral features were screened by the optimum index factor (OIF), vegetation indices and texture features were both selected by the method of the principal component analysis. After that, the method of mean decrease in accuracy (MDA) was applied to evaluate the importance of the above feature variables, and six feature combination schemes were constructed, which were combined with field survey data for RF classification. Finally, by comparing the accuracy of six different combination schemes, the best combination of feature variables was selected. And the classification results of the RF and support vector machine (SVM) of the best combination were compared to verify the practicability of RF in agricultural land use classification. The results were as follows: (1) To avoid the degradation of classification performance caused by "curse of dimensionality", this study used OIF and principal component analysis to optimize the features. The results showed that this method was effective and significantly improved the classification accuracy of land use types in agricultural areas; (2) The four basic feature variables were sorted by feature importance, indicating that the importance of different features was as follows: red-edge indices > vegetation indices without red-edge band > spectral features > texture features; (3) The comparison of the classification results of 7 experimental schemes revealed that by adding vegetation indices, texture features, and other information, the classification accuracy of land use could be effectively improved. Besides, based on feature optimization, the RF algorithm had the highest classification accuracy, and the overall accuracy was 88.24%, kappa coefficient was 0.84, which was better than SVM classification results under the same feature variables. In a word, the RF based on feature optimization which was proposed in this study provided a new method to effectively improve the accuracy of land use classification in farming areas, and technical support and theoretical reference for land resource monitoring and management.
-
-