Abstract:
Hyperspectral reflectance of soil can be used to effectively estimate the status of heavy metal pollution in soil. This study aims to better investigate the soil heavy metal information reflected by spectra. Quantitative comparisons were performed on the advantages and adaptability of different spectral preprocessing, feature selection, and modeling. Optimal inversion models were constructed for the rapid hyperspectral estimation of soil quality in the lead-zinc mining areas with multi-heavy metal pollution. 100 soil samples at 0-30 cm depth were also collected from the lead-zinc mining area in Laiyuan County, Baoding City, Hebei Province, China. The contents of Cu, Cr, Ni, Zn, Cd, and Pb were analyzed in these soil samples. The pollution status of heavy metals in the study area was assessed using the Nemerow pollution index and potential ecological risk index. The raw hyperspectral reflectance of soil samples was measured by the standard procedure with a spectrometer of SVC HR-1024 under laboratory conditions. Firstly, Savitzky-Golay smoothing (SG) was applied to the soil spectral data, where the SG-smoothed spectra were marked as the original spectra (OR). Secondly, three mathematical transformations were applied to the original spectra (OR), including multiple scatter correction (MSC), standard normal variate (SNV), and reciprocal logarithm (LR). Each transformed spectrum was then subjected to the first-order derivative (FD) and second-order derivative (SD) transformations. Thirdly, the different pre-processing spectra were filtered by the correlation coefficient study (PCC) and Boruta algorithm for the feature bands. Then the heavy metal content was analyzed using partial least squares regression (PLSR) and various machine learning models, including random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), gaussian process regression (GPR), and backpropagation neural network (BPNN). The accuracy of the estimation model was evaluated to compare the coefficient of determination (
R2), root mean square error (RMSE), and relative percent difference (RPD) between the prediction and validation values. The optimal models were selected after optimization. The results indicated that: 1) The soil Cr and Ni were non-contaminated in the study area, whereas, Cu, Zn, Cd, and Pb showed varying degrees of pollution, with the more severely affected Zn and Cd. Compared with the local soil heavy metal background values, the mean Nemerow pollution index was 29.71, indicating heavy pollution. Meanwhile, the mean potential ecological risk index was 1 330.3, indicating the high ecological risk state. Overall, there was the multi-heavy metals compound pollution with the high ecological risk. 2) Spectral differentiation yields favorable results but is susceptible to noise interference, whereas techniques such as multiple scatter correction, standard normal variate, and reciprocal logarithm transformations can be employed for spectral denoising, thereby enhancing overall processing performance. 3) In feature extraction, the correlation coefficient was
Rmax2=0.81 and
Rmin2=0.44, in order to extract the larger number of bands with the precision metrics for the heavy metal inversion models. By contrast, the Boruta algorithm was used to extract the fewer feature bands, corresponding to the model precision metrics of
Rmax2=0.76 and
Rmin2=0.51. A comparative analysis revealed that the Boruta algorithm was achieved in the more stable performance of model prediction, in order to minimize the optimal combination of feature bands. 4) BPNN and XGBoost demonstrated better capabilities to represent the nonlinear relationship between heavy metal content and spectra. BPNN was achieved optimal inversion for Cr (
R2=0.81, RPD=2.37), Ni (
R2=0.76, RPD=2.08), and Zn (
R2=0.69, RPD=1.85). XGBoost was achieved optimal inversion for Pb (
R2=0.76, RPD=2.08), and Cd (
R2=0.68, RPD=1.81). SVMR was achieved in the optimal inversion for Cu (
R2=0.58, RPD=1.58). The different spectral preprocessing, feature selection, and modeling posed a significant impact on the inversion of soil heavy metal content. Optimal preprocessing and modeling can be expected to effectively enhance the accuracy of inversions. This research can provide a strong reference for further efficient, accurate, and large-scale remote sensing monitoring of soil heavy metal pollution, particularly for soil remediation and treatment in lead-zinc mining areas.