数据驱动的农作物遥感分类地面样本点布设

吴清滢; 余强毅; 段玉林; 吴文斌

doi:10.11975/j.issn.1002-6819.202210229

数据驱动的农作物遥感分类地面样本点布设

Data-driven field sample location approach for crop classification using remote sensing

摘要

摘要: 地面样本点是农作物遥感分类模型训练的基础，样本点数量和质量是影响模型分类精度的2个主要因素。该研究构建了数据驱动的样本点布设方法，利用待分类影像的光谱、植被指数等特征构造分层抽样底图，结合分层随机抽样方法进行地面样本点布设，并分析不同抽样策略对农作物遥感分类结果的影响。采取基于k-means聚类分析的数据驱动方法，考虑6景哨兵2号影像提取的共78个分类特征，生成同一个最优k的聚类结果图；设计等量分配和按面积比分配2种样本量分配方式，样本点数量为25、49、100、169、225的5个总样本量；基于不同抽样策略获取地面样本点信息，利用同一个支持向量机模型对待分类影像进行监督分类，并通过与139个样本点的理论总样本量和400个样本点的传统方式总样本量对比分析，定量解析不同抽样策略对分类精度的影响。结果表明：1）在数据驱动非监督聚类生成的底图上进行抽样（按面积比分层抽样法、等量分层抽样法）获得的样本点质量和分类精度明显优于没有该底图的抽样策略（简单随机抽样法、系统抽样法）；2）当总样本量低于理论总样本量时，等量分层抽样法能获取比按面积比分层抽样法更高的分类精度。例如，当理论样本量为139时，总样本量为25、49和100时等量分层抽样法的分类精度均值（75.5%、80.5%和86.0%）均明显高于按面积比分层抽样法的分类精度均值（44.0%、69.0%和83.0%），而当总样本量为169和225时，两种分层抽样的分类精度均值都在90.0%左右；3）当满足总体精度需求时，分层抽样法所需的实际总样本量小于理论样本量，可极大提高抽样效率。例如，等量分层抽样法的实际样本量为理论样本量的约70%便可满足85.0%的总体精度需求；当分类精度与人工选取方式分类精度一样时（97.5%），等量分层抽样法的实际样本量仅为传统方式样本量的约90%。这同时印证了分类精度及稳定性随着总样本量的增加而增加这一普遍认识，但当总样本量超过一定值时，精度增长速度变慢。该方法可以获取类间均衡、类内多样化的样本集，为农作物遥感地面样本点布设、快速高效分类提供参考。

Abstract: Abstract: The field sample points can be directly input into the crop classification models using remote sensing. Therefore, the quantities and quality of sample points can dominate both the classification accuracy and mapping. In this study, a data-driven approach was established for sampling strategies using the features of spectral bands and vegetation indices from image classification. A field sample points approach was carried out to combine a few stratified random sampling, and then followed by the multiple evaluation metrics, according to the dependence of the crop remote sensing classification upon the varied sampling. A data-driven approach based on K-means unsupervised clustering was used to generate a graph of clustering with the same optimal K, considering 78 classification features extracted from the 6-phase Sentinel-2 images. The comparison experiments consisted of two intra-stratified sample allocation strategies with equal and area-ratio sample allocation, five total sample sizes of 25, 49, 100, 169 and 225, one theoretical total sample size of 139 and one traditional method of total sample size of 400. The accuracy of the mapping was also evaluated by the Support Vector Machine (SVM) classification model. The experimental results showed: (1) Sampling on the data-driven basemap generated by unsupervised clustering (area-ratio, and equal stratified sampling ) was obtained the better quality sample dataset, which was significantly higher classification accuracy than that without the basemap (simple random, and systematic sampling); (2) In cases where the total sample size was less than the theoretical total sample size, the equal stratified sampling performed better than the area-ratio stratified sampling. For example, when theoretical sample size was 139, mean accuracies of classification (75.5%, 80.5% and 86.0%) with the equal stratified sampling method at total sample sizes of 25, 49 and 100 was significantly higher than that with the area-ratio stratified sampling method (44.0%, 69.0% and 83.0%) while mean accuracies of classification with the two stratified methods at total sample sizes of 169 and 225 were all around 90.0%; (3) The actual total sample size by stratified sampling was smaller than the theoretical sample size, in order to fully meet the overall requirement of accuracy, indicating the great improvement in the sampling efficiency. For example, equal stratified sampling was required about one-seventh of the theoretical sample size to satisfy the overall accuracy requirement of 85.0%. The classification accuracy was equal to that of the manual selection (overall accuracy=97.5%), and the actual sample size of the equal stratified sampling was about one-ninth of the traditional one. Therefore, the classification accuracy and stability increased with the total sample size and then tended to saturate at the end, even if the sample size continued to increase. A well-balanced inter-class and diverse within-class sample set can be expected to obtain for an optical field sample distribution using crop remote sensing classification

HTML全文

参考文献(40)

施引文献

资源附件(0)