Abstract:
Abstract: Soil total nitrogen (STN) plays an important role in soil fertility and N cycle. Detailed information about the spatial distribution of STN is vital to effective management of soil fertility and better understanding of the process of N cycle. To date, however, few studies have been conducted to digitally map the spatial variation of STN for rubber (Hevea brasiliensis) plantation at the regional scale in Hainan Island, China. In this study, a relatively new method, random forest (RF) was proposed to predict and map the spatial pattern of STN for the rubber plantation. A total of 2511 topsoil (0-20 cm) samples were collected, and their STN contents were measured. Then these soil samples were randomly divided into calibration dataset (1757 soil samples) and validation dataset (754 soil samples). Fourteen environmental variables were also collected. They are parent materials, mean precipitation, mean temperature, mean normalized difference vegetation index, elevation, slope, aspect, horizontal curvature, profile curvature, relief, convergence index, relative position index, stream power index, and topographic wetness index. In this study, stepwise linear regression (SLR), generalized additive mixed model (GAMM), classification and regression tree (CART), and random forest (RF) were used to predict and map the spatial distribution of STN for the rubber plantation. In addition, GAMM and CART were also employed to uncover relationships between STN and environmental variables and further to identify the main factors influencing STN variation. The RF model was developed to predict spatial variability of STN on the basis of parent materials, mean precipitation, mean temperature, and mean normalized difference vegetation index. Performance of RF was compared with SLR, GAMM, and CART. Mean error (ME), mean absolute error (MAE), root mean squared error (RMSE), and correlation coefficient between measured STN and predicted STN were selected as comparison criteria. Results showed that RF performed much better than SLR, GAMM, and CART in predicting and mapping the spatial distribution of STN for the rubber plantation at regional scale in this study. The RF model had much higher correlation coefficient value and lower prediction errors (ME, MAE, and RMSE) than SLR, GAMM, and CART. Values of correlation coefficient, ME, MAE, and RMSE were 0.82, -0.003 g/kg, 0.088 g/kg, and 0.131 g/kg, 0.69, 0.003 g/kg, 0.121 g/kg, and 0.162 g/kg, 0.70, -0.004 g/kg, 0.120 g/kg, and 0.160 g/kg, and 0.68, -0.008 g/kg, 0.121 g/kg, 0.163 g/kg for RF, CART, GAMM, and SLR equation, respectively. Moreover, RF model yielded a more realistic spatial distribution of STN than SLR, GAMM, and CART equations. Finally, results of CART and GAMM showed that the relationships between STN and selected environmental variables (parent materials, mean precipitation, mean temperature, and mean normalized difference vegetation index) were hierarchical and non-linear in this study area. Analysis of variable importance indicated that parent materials and mean precipitation were the most important factors influencing spatial distribution of STN for rubber plantation at regional scale in this study. Overall, the good performance of RF model could be ascribed to its good capabilities of dealing with non-linear and hierarchical relationships between STN and environmental variables. These results suggested that RF is a promising approach in predicting spatial distribution of STN for rubber plantation at regional scale, and can be applied to predict other soil properties in regions with complex soil-environmental relationships.