Predicting the spatial distribution of soil organic carbon using environmental similarity with limited samples
-
-
Abstract
Soil organic carbon (SOC) has been one of the most important indicators for the soil quality and agricultural sustainability. Commonly-used models can generally require a large number of soil samples as training dataset in many cases, in order to predict the spatial distribution of SOC. However, the soil sampling is time-consuming, laborious and costly. Once only a limited number of soil samples is available, the existing models, such as multiple linear regression (MLR) and artificial neural networks (ANN), cannot well acquire the reliable relation between environmental variables and SOC, leading to the unsatisfactory prediction. In this study, an environmental similarity model (ESM) was proposed, according to the assumption that the more similar the soil forming environment was, the more similar the soil properties were. Three major steps were used to design the ESM: (1) To characterize the soil forming environment on the soil sampling and unvisited sites using key environmental variables that affected on the spatial distribution of SOC, (2) To assess the environmental similarity between the sampling and the unvisited sites, (3) To estimate the SOC content at the unvisited sites, according to the environmental similarity. A field test was carried out to verify the feasibility of the improved model. Taking Yunnan Province as the case study area, a total of 64 soil samples were derived from the World Soil Information Service (WoSIS) database. Three scenarios were then set using the total samples: (1) In the first scenario, 10 soil samples were randomly selected from the 64 soil samples as the training set, and the remaining 54 samples were used as test set. This selection was repeated for 20 times. As such, 20 groups were obtained for 10 training and 54 test samples. (2) In the second scenario, 20 samples were randomly chosen from the total samples as the training set, and the remaining 44 samples were employed as the test set. 20 groups were obtained as 20 training and 44 test samples. (3) In the third scenario, 30 samples were randomly derived from the total samples as the training set, and the remaining 34 samples were used as the test set. 20 groups were obtained as 30 training and 34 test samples. Two indices, namely mean absolute error (MAE) and root mean square error (RMSE), were used to measure the prediction accuracy of the three models. Analysis of variance (ANOVA) was applied to compare the prediction accuracy among the three models. The results showed that the MAE of ESM were 12.7, 11.7, and 11.1 g/kg, respectively, for the first, the second and the third scenarios, which were all significantly lower (P < 0.05) than those of MLR (72.6 g/kg, sample size n = 10; 23.0 g/kg, n = 20; 16.7 g/kg, n = 30) and ANN (15.8g /kg, n = 10; 14.9 g/kg, n = 20; 15.8 g/kg, n = 30). Therefore, the ESM was achieved in the high accuracy of prediction and strong robustness. The finding can provide a new way to predict the spatial distribution of SOC.
-
-