基于VNIR和机器学习算法的原状土剖面Cu含量预测

    Prediction of Cu concentrations in intact soil profiles based on VNIR and machine learning algorithms

    • 摘要: 快速测量土壤剖面重金属含量是评估土壤重金属污染状况并选择相应修复技术的关键。为了探讨可见光-近红外光谱法(Visible and Near-Infrared Reflectance Spectroscopy,VNIR)预测原状土壤剖面重金属含量的潜力,以江西省两个典型工矿厂周边农田土壤为研究对象,共采集了19个深度约100 cm的完整土壤剖面样品,分别测定土壤剖面样品的VNIR数据及其Cu含量。采用偏最小二乘回归法(Partial Least Squares Regression,PLSR)、Cubist混合线性回归决策树(Cubist Regression Tree,Cubist)、高斯过程回归(Gaussian Process Regression,GPR)和支持向量机(Support Vector Machine Regression,SVM)方法研究不同光谱预处理方法对土壤Cu含量预测精度的影响。结果显示,Cubist、GPR和SVM这三种机器学习算法的预测精度普遍高于PLSR,其中一阶导数(First-Order Derivative,FD)预处理的SVM模型预测精度最高(R2=0.95,均方根误差为7.94 mg/kg,相对分析误差为4.34)。这表明利用VNIR和机器学习可以对原状土壤剖面Cu含量进行有效预测,为快速监测Cu及其他重金属含量的相关研究提供参考。

       

      Abstract: An accurate and rapid measurement of heavy metal concentration in the soil profiles has been one of the key steps to assessing the soil heavy metal pollution for the subsequent remediation. This study aims to investigate the potential of Visible and Near-infrared Reflectance Spectroscopy (VNIR) for the prediction of heavy metals concentration in intact soil profiles. A total of 19 complete soil profile samples with a depth of about (100±5) cm were collected in the cropland surrounding two typical mining areas in Jiangxi Province, China. A series of measurements were conducted to determine the Cu concentration in the intact soil profile samples and the reflectance spectra between 350 and 2 500 nm. Partial Least Squares Regression (PLSR) and three machine learning algorithms (Cubist Regression Tree, Cubist; Gaussian Process Regression, GPR; Support Vector Machine, SVM) were compared to assess the prediction performance of Cu concentration in the intact soil profiles. In addition, a systematic investigation was implemented to clarify the influences of 11 different spectral preprocessing operations (Raw Spectral Reflectance, Raw; Absorbance Conversion, Abs; First-order Derivative, FD; Multiplicative Scatter Correction, MSC; Standard Normal Variate Transformation, SNV; Gap-Segment Derivatives, GS; Savitzky-Golay Smoothing, SG) and the combination of pretreatment approaches (MSC+FD; Abs+FD; SNV+GS; SNV+SG) on the accuracy of Cu concentration prediction models. The coefficient of determination (R2), Root Mean Square Error (RMSE), and Ratio of Performance to Deviation (RPD) were used to evaluate the prediction performance of the models. The results showed that the prediction accuracy of the machine learning for the Cu concentration in the intact soil profiles was outstandingly higher than that of the PLSR model. Furthermore, the predictive performance of SVM was better than the other two machine learning models, indicating the highest prediction accuracy for the SVM model in combination with FD preprocessing. The cross-validation of FD-SVM showed that the R2, RMSE, and RPD were 0.80, 14.83 mg/kg, and 1.87, respectively, whereas, the independent validation of FD-SVM showed that the R2, RMSE, and RPD were 0.95, 7.94 mg/kg, and 4.34, respectively. Furthermore, the comparison of different preprocessing showed that the FD and SNV+GS were the two most robust ones. The FD preprocessing presented the best performance in the GPR and SVM models, while the SNV+GS presented the best performance in the PLSR and Cubist models. There were reliable VNIR predictive models for the Cu concentration in the intact soil profile samples, compared with the prediction models established on the soil samples after air drying, grinding, and sieving. Consequently, the machine learning algorithms can be expected to predict the Cu concentration in the intact soil profiles for the optimal predictive model. The finding can also provide a strong reference for the accurate and rapid monitoring of the heavy metal concentration in the soil heavy metal pollution.

       

    /

    返回文章
    返回