河北铅锌尾矿库区土壤重金属含量高光谱反演方法对比

    Comparative study on the hyperspectral inversion methods for soil heavy metal contents in Hebei lead-zinc tailings reservoir areas

    • 摘要: 为更好地研究利用光谱反映的土壤重金属信息,实现具有多重金属复合污染问题的铅锌矿区土壤重金属含量高光谱快速估测,该研究以河北省某铅锌矿区为例,首先对研究区土壤的Cu、Cr、Ni、Zn、Cd、Pb污染状况进行了评价分析,其次基于实验室高光谱数据,组合变换光谱、特征变量和反演算法形成不同反演策略,通过各反演策略下的重金属反演精度比较,定量分析不同光谱预处理、特征选择和建模算法的优劣与适应性,构建最优反演模型。研究结果表明:1)研究区土壤Cr、Ni清洁程度较好,其余Cu、Zn、Cd、Pb均有不同程度污染;参比当地土壤背景值,区域内梅罗综合污染指数均值29.71,为重度污染,潜在生态风险因子均值1330.32,处于高生态风险状态;2)光谱预处理可以增强土壤重金属信息表达。其中,光谱微分效果较好,但易受噪声影响,而多元散射校正、标准正态变量、倒数对数变换可以进行光谱去噪,提升处理效果;3)特征选择方法中,相关系数法选择特征波段数目多,不同重金属反演R2 差异较大;Boruta法选择特征波段数目少,不同重金属反演R2 差异较小;4)BPNN、XGBoost可以较好描述重金属含量与光谱的非线性关系,相较于其他算法具有更好表现,分别实现了Cr、Ni、Zn和Pb、Cd的最优反演,SVMR实现了Cu的最优反演。研究表明,不同的光谱预处理、特征选择与建模算法对于土壤重金属含量的反演均具有较大影响,选择合适的处理、建模算法可以有效提升反演精度。该研究为进一步实现高效、准确、大范围遥感监测铅锌矿区土壤重金属污染状况提供参考依据。

       

      Abstract: Hyperspectral reflectance of soil can be used to effectively estimate the status of heavy metal pollution in soil. This study aims to better investigate the soil heavy metal information reflected by spectra. Quantitative comparisons were performed on the advantages and adaptability of different spectral preprocessing, feature selection, and modeling. Optimal inversion models were constructed for the rapid hyperspectral estimation of soil quality in the lead-zinc mining areas with multi-heavy metal pollution. 100 soil samples at 0-30 cm depth were also collected from the lead-zinc mining area in Laiyuan County, Baoding City, Hebei Province, China. The contents of Cu, Cr, Ni, Zn, Cd, and Pb were analyzed in these soil samples. The pollution status of heavy metals in the study area was assessed using the Nemerow pollution index and potential ecological risk index. The raw hyperspectral reflectance of soil samples was measured by the standard procedure with a spectrometer of SVC HR-1024 under laboratory conditions. Firstly, Savitzky-Golay smoothing (SG) was applied to the soil spectral data, where the SG-smoothed spectra were marked as the original spectra (OR). Secondly, three mathematical transformations were applied to the original spectra (OR), including multiple scatter correction (MSC), standard normal variate (SNV), and reciprocal logarithm (LR). Each transformed spectrum was then subjected to the first-order derivative (FD) and second-order derivative (SD) transformations. Thirdly, the different pre-processing spectra were filtered by the correlation coefficient study (PCC) and Boruta algorithm for the feature bands. Then the heavy metal content was analyzed using partial least squares regression (PLSR) and various machine learning models, including random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), gaussian process regression (GPR), and backpropagation neural network (BPNN). The accuracy of the estimation model was evaluated to compare the coefficient of determination (R2), root mean square error (RMSE), and relative percent difference (RPD) between the prediction and validation values. The optimal models were selected after optimization. The results indicated that: 1) The soil Cr and Ni were non-contaminated in the study area, whereas, Cu, Zn, Cd, and Pb showed varying degrees of pollution, with the more severely affected Zn and Cd. Compared with the local soil heavy metal background values, the mean Nemerow pollution index was 29.71, indicating heavy pollution. Meanwhile, the mean potential ecological risk index was 1 330.3, indicating the high ecological risk state. Overall, there was the multi-heavy metals compound pollution with the high ecological risk. 2) Spectral differentiation yields favorable results but is susceptible to noise interference, whereas techniques such as multiple scatter correction, standard normal variate, and reciprocal logarithm transformations can be employed for spectral denoising, thereby enhancing overall processing performance. 3) In feature extraction, the correlation coefficient was Rmax2=0.81 and Rmin2=0.44, in order to extract the larger number of bands with the precision metrics for the heavy metal inversion models. By contrast, the Boruta algorithm was used to extract the fewer feature bands, corresponding to the model precision metrics of Rmax2=0.76 and Rmin2=0.51. A comparative analysis revealed that the Boruta algorithm was achieved in the more stable performance of model prediction, in order to minimize the optimal combination of feature bands. 4) BPNN and XGBoost demonstrated better capabilities to represent the nonlinear relationship between heavy metal content and spectra. BPNN was achieved optimal inversion for Cr (R2=0.81, RPD=2.37), Ni (R2=0.76, RPD=2.08), and Zn (R2=0.69, RPD=1.85). XGBoost was achieved optimal inversion for Pb (R2=0.76, RPD=2.08), and Cd (R2=0.68, RPD=1.81). SVMR was achieved in the optimal inversion for Cu (R2=0.58, RPD=1.58). The different spectral preprocessing, feature selection, and modeling posed a significant impact on the inversion of soil heavy metal content. Optimal preprocessing and modeling can be expected to effectively enhance the accuracy of inversions. This research can provide a strong reference for further efficient, accurate, and large-scale remote sensing monitoring of soil heavy metal pollution, particularly for soil remediation and treatment in lead-zinc mining areas.

       

    /

    返回文章
    返回