基于两点机器学习方法的土壤有机质空间分布预测

    Prediction of the spatial distribution of soil organic matter based on two-point machine learning method

    • 摘要: 准确预测土壤有机质(Soil Organic Matter,SOM)空间分布对精细农业、耕地质量建设、生态环境保护以及固碳减排等均具有重要的意义。该研究探讨了基于两点机器学习方法(Two-point Machine Learning,TPML)提高SOM空间分布预测的可行性。以黑龙江省海伦市为研究区,以气候、地形地貌、社会经济和空间位置信息等因素作为辅助变量,充分利用空间位置信息和属性相似关系,有效处理SOM空间分布异质性及其与辅助变量间关系异质性,以提高TPML方法进行SOM空间分布预测的精度。采用随机森林、基于随机森林的回归克里格、反距离权重法和普通克里格(Ordinary Kriging,OK)方法作为对比,以平均绝对误差(Mean Absolute Error,MAE)、均方根误差(Root Mean Square Error,RMSE)、预测值与真实值相关系数(r)和决定系数(R2)作为评价指标,进行不同样本量下的多组对比试验,评价不同方法的预测精度。结果表明:1)研究区SOM含量在1.775~7.188 g/kg之间,平均值为3.179 g/kg,空间分布不均匀,呈东高西低的分布趋势。2)在不同样本量条件下,与其他模型相比,TPML的预测精度均最高,其MAE(0.088~0.097 g/kg)和RMSE(0.116~0.139 g/kg)均为最小,r(0.992~0.996)和R2(0.971~0.985)均为最高。3)预测值的误差标准差(理论误差)与实际误差具有相似的空间模式,说明TPML可以为预测结果提供合理的不确定性估计。综上,TPML模型可以通过同时利用空间自相关性和属性相似性来提高预测精度,该模型适用于预测具有一定空间自相关性且具有可用辅助数据的资源环境变量。

       

      Abstract: Abstract: An accurate prediction of the spatial distribution of Soil Organic Matter (SOM) is of great importance for precision agriculture, farmland quality construction, ecological environment protection, and soil carbon sequestration. However, the accuracy of prediction dominates by the heterogeneity of SOM spatial distribution and its relationship with auxiliary variables. Taking Hailun City, Heilongjiang Province (126°14′-127°45′ E, 48°58′-47°52′ N) of northeast China as the study area, this study aims to accurately and rapidly predict the SOM spatial distribution using a Two-Point Machine Learning Method (TPML) with the climate, topography, socio-economic, and spatial location as the auxiliary variables. The spatial location and auxiliary variables were also integrated to effectively deal with the heterogeneity of SOM spatial distribution and the heterogeneity of its relationship with auxiliary variables. The performance of TPML was then evaluated using the Random Forest (RF), RF regression kriging, inverse distance weighting, and Ordinary Kriging (OK) models. The performances of the models with samples of different sizes were also evaluated using the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), correlation coefficient between the predict and true value (r), and the coefficient of determination (R2). The results reveal that: 1) The SOM was predicted to range from 1.775 to 7.188 g/kg in the study area, with an average value of 3.179 g/kg. The spatial distribution of SOM spatially varied, with a trend of the high in the east and the low in the west. Meanwhile, the SOM content was positively correlated with the normalized difference vegetation index (NDVI), digital elevation, and mean annual precipitation, whereas, negatively correlated with the gross domestic product, mean annual air temperature, and topographic wetness index, particularly significantly related to the land use, landform, vegetation, and soil type. 2) The TPML presented the highest accuracy of prediction under different sample sizes, with the lowest MAE (0.088-0.097 g/kg) and RMSE (0.116-0.139 g/kg), while the highest r (0.992-0.996) and R2 (0.971-0.985). The MAE and RMSE of the TPML model were improved much more than 0.7 g/kg, while the r and R2 were improved by more than 0.2, and 0.9, respectively, compared with the most frequently-used OK. 3) There is a similar spatial pattern between the standard deviation of prediction errors (theoretical errors) and the actual errors, indicating that the TPML provided reasonable uncertainty estimates for the prediction. Consequently, the TPML can be expected to employ spatial autocorrelation and attribute similarity at the same time for higher spatial prediction accuracy. Anyway, the TPML spatial prediction of variables is feasible for the resource and environment with a certain degree of spatial autocorrelation and available auxiliary data.

       

    /

    返回文章
    返回