Abstract:
Abstract: The unbiasedness in feature space and geographical space is generally used for evaluating the samples' representation in measure region. Although auxiliary variables are often used for mapping soil variables, problems associated with categorical variables are rarely mentioned. In this paper, a method of unbiased sampling of soil which compromise ing between spreading in geographical space and feature space is presented for optimization of the sample pattern based on multidimensional categorical auxiliary variables. In this method, the feature space is constructed of categorical auxiliary variables; the optimization function which blends the uniform distributions of feature space and geographical space is minimized. The optimal pattern is obtained using simulated annealing. In addition, Feature Divation Index (FDI) is defined in the paper to measure the unbiasedness in feature space of samples. The method was tested through a case study on the soil heavy metal in farmland in Shunyi District, Beijing. As basis data of experiment, the spatial distribution data of samples in the study area were selected in 2007, 2008, 2009 years. The original sample size was 1139, from which the calculated optimal sampling number without considering spatial correlation was 450. Taking 100 as lower limit and 450 as upper limit, eight datasets of sampling number were set, which were 100, 150, 200, 250, 300, 350, 400, 450, respectively. According to scale transformation, the sampling scales were 0.225, 0.239, 0.255, 0.275, 0.302, 0.337, 0.389, 0.477, respectively. Two patterns were selected as the sampling layout to compare with the unbiased sampling: the patterns of uniform sampling in feature space (feature uniform sampling) and of uniform sampling in geographic space (geographical uniform sampling). Land type, soil texture and parent material as categorical auxiliary variables were used to represent the feature space of soil variables. The performances in the overall estimates, the uniformity in feature space and the mapping accuracy were compared between the 3 types of designs. It can be concluded that in the case of global estimation, based on relative error of mean, range, coefficient of skewness and kurtosis, the precision of geographical uniform sampling is worst; when the sampling scale is larger than 0.275, feature uniform sampling achieves best results; as the sampling scale becomes smaller, unbiased sampling can get better results. What's more, the sampling designs by geographical uniform sampling have the worst feature presentation. When the sampling scale is larger, the representative of samples by feature uniform sampling in feature space is the best; when sampling scale is less than 0.302, unbiased sampling is suggested as a prudent sampling strategy. Finally, in the case of forming geostatistical map, root mean square error (RMSE) between basis data and the Kriging predicted data is used. Unbiased sampling is shown to be competitive in reproducing area with high accuracy. In a word, among the methods of optimizing sampling based on auxiliary variable, the feature uniform sampling is suitable in the situation of large sampling scale or fewer sample points and can only be used in global estimation; the unbiased sampling is a good compromise between spreading in geographical space and feature space, and it can achieve better result in the case of small sampling scale or more sample points.