Abstract:
Abstract: A rapid and accurate detection of heavy metal content in farmland soils is crucial for land quality assessment and food security. In this study, 203 soil samples were collected from the farmland polluted by Xiaolong Tungsten Mine located in the Xiancha River watershed in Jiangxi Province, southern China. Soil Cadmium (Cd) content was measured using inductive coupling plasma mass-spectrometric (ICP-MS). High-resolution WorldView-3 multispectral imagery was used to extract the spectral reflectance and transformations, including the first order differential reflectance (FDR) and reciprocal logarithm spectra. A correlation analysis was performed to select the sensitive bands suitable for the prediction of soil cadmium (Cd) content. Different from previous studies that merely used the spectral information for modeling, the key environmental factors were also considered as the influence factors of the spatial distribution of Cd content, including the terrain factor (DEM), soil attribute factors (soil organic carbon and pH), and anthropogenic factors (distance to mine and residential area). Partial Least Squares Regression (PLSR), Support Vector Machine (SVM), Back Propagation Neural Network (BPNN), and Random Forest (RF) were used to construct the prediction models of soil Cd content. The best inversion model was selected by the accuracy metrics. The results showed that the transformation of WorldView-3 original spectral data by the first-order differential improved the correlation between spectral data and soil Cd content. However, the prediction accuracy remained low using the inversion model only with the spectral characteristic parameters. By contrast, the environmental covariates alone generated the best accuracy (R2=0.782, RMSE=0.384, MAE=0.294) using RF modelling. Surprisingly, the predictive performance was not significantly improved as expected, when integrating environmental covariates and the spectral information transformed by reciprocal logarithm for modelling, which instead resulted in a slight reduction in the accuracy of the optimal RF model (R2=0.693, RMSE=0.448, MAE=0.336). According to the variable importance ranking, it was found that the relative importance of the five key environmental variables was higher than 74%, which was significantly higher than that of the multispectral bands. Moreover, the model driven by the integration of environmental variables and spectral bands produced a similar spatial distribution trend of soil Cd content to that of the model driven by environmental variables alone from the perspective of spatial prediction. Both models showed that the Cd content in the farmland soil in the study area presented a high degree of spatial heterogeneity, both indicating an increasing distribution trend from the northwest to the southeast. In addition, the soil Cd content showed an increasing trend with the decrease of the distance from the mining area and enriched in the densely populated areas. Despite these similarities, the spatial prediction map with environmental variables alone presented the outstanding strip effect in the southeastern region of the study area. Contrastingly, there was better spatial continuity in the soil Cd map generated by integrating spectral information and environmental variables. These findings indicated that the key environmental covariates were important variables to predict the spatial distribution of heavy metals in farmland soil, whereas the capability of soil heavy metal retrieval using multispectral imagery alone was limited. In addition, the random forest was an effective way to predict the spatial distribution of heavy metals in soil.