Abstract:
An accurate and rapid measurement of heavy metal concentration in the soil profiles has been one of the key steps to assessing the soil heavy metal pollution for the subsequent remediation. This study aims to investigate the potential of Visible and Near-infrared Reflectance Spectroscopy (VNIR) for the prediction of heavy metals concentration in intact soil profiles. A total of 19 complete soil profile samples with a depth of about (100±5) cm were collected in the cropland surrounding two typical mining areas in Jiangxi Province, China. A series of measurements were conducted to determine the Cu concentration in the intact soil profile samples and the reflectance spectra between 350 and 2 500 nm. Partial Least Squares Regression (PLSR) and three machine learning algorithms (Cubist Regression Tree, Cubist; Gaussian Process Regression, GPR; Support Vector Machine, SVM) were compared to assess the prediction performance of Cu concentration in the intact soil profiles. In addition, a systematic investigation was implemented to clarify the influences of 11 different spectral preprocessing operations (Raw Spectral Reflectance, Raw; Absorbance Conversion, Abs; First-order Derivative, FD; Multiplicative Scatter Correction, MSC; Standard Normal Variate Transformation, SNV; Gap-Segment Derivatives, GS; Savitzky-Golay Smoothing, SG) and the combination of pretreatment approaches (MSC+FD; Abs+FD; SNV+GS; SNV+SG) on the accuracy of Cu concentration prediction models. The coefficient of determination (R2), Root Mean Square Error (RMSE), and Ratio of Performance to Deviation (RPD) were used to evaluate the prediction performance of the models. The results showed that the prediction accuracy of the machine learning for the Cu concentration in the intact soil profiles was outstandingly higher than that of the PLSR model. Furthermore, the predictive performance of SVM was better than the other two machine learning models, indicating the highest prediction accuracy for the SVM model in combination with FD preprocessing. The cross-validation of FD-SVM showed that the R2, RMSE, and RPD were 0.80, 14.83 mg/kg, and 1.87, respectively, whereas, the independent validation of FD-SVM showed that the R2, RMSE, and RPD were 0.95, 7.94 mg/kg, and 4.34, respectively. Furthermore, the comparison of different preprocessing showed that the FD and SNV+GS were the two most robust ones. The FD preprocessing presented the best performance in the GPR and SVM models, while the SNV+GS presented the best performance in the PLSR and Cubist models. There were reliable VNIR predictive models for the Cu concentration in the intact soil profile samples, compared with the prediction models established on the soil samples after air drying, grinding, and sieving. Consequently, the machine learning algorithms can be expected to predict the Cu concentration in the intact soil profiles for the optimal predictive model. The finding can also provide a strong reference for the accurate and rapid monitoring of the heavy metal concentration in the soil heavy metal pollution.