基于变量优选与机器学习的干旱区湿地土壤盐渍化数字制图

    Digital mapping of soil salinization in arid area wetland based on variable optimized selection and machine learning

    • 摘要: 土壤盐渍化是导致土壤退化和生态系统恶化的主要原因之一,对干旱区的可持续发展构成主要威胁。为了尽可能精确地监测土壤盐渍化的空间变异性,该研究收集新疆艾比湖湿地78个典型样点,其中选取54个样本作为训练集,24个样本作为独立验证集。基于Sientinel-2 多光谱传感器(Multi-Spectral Instrument,MSI)、数字高程模型(Digital Elevation Model,DEM)数据提取3类指数(红边光谱指数、植被指数和地形指数),经过极端梯度提升(Extreme Gradient Boosting,XGBoost)算法筛选有效特征变量,构建了关于土壤电导率(Electrical Conductivity,EC)的随机森林(Random Forest,RF)、极限学习机(Extra Learning Machine,ELM)和偏最小二乘回归(Partial Least Squares Regression,PLSR)预测模型,并选择最优模型绘制了艾比湖湿地盐渍化分布图。结果表明:优选的红边光谱指数基本能够预测EC的空间变化;红边光谱指数与植被指数组合建模效果总体上优于其与地形指数的组合,3类指数组合的建模取得了较为理想的预测精度,其中RF模型表现最优(验证集R2=0.83,RMSE=4.81 dS/m,RPD=3.11);在整个研究区内,中部和东部地区土壤盐渍化程度尤为严重。因此,XGBoost所筛选出的环境因子结合机器学习算法可以实现干旱区土壤盐渍化的监测。

       

      Abstract: As a global problem, soil salinization poses a serious threat to the limited soil resources and ecosystem health in arid and semi-arid areas, and is one of the most important causes of land desertification and land degradation. Soil salinity is an effective evaluation index of soil salinization, and there is temporal and spatial difference. Dynamic monitoring can fully understand the status of soil salinization and effectively provide more quantitative information for soil restoration and land reclamation. Compared with traditional laboratory analysis, satellite remote sensing technology has major advantages in observing the ground at large spatial scales and high temporal resolution. As a new generation of spaceborne multi-spectral instrument (MSI), Sentinel-2A has novel spectral functions (namely, three red-edge bands and two near-infrared bands), which provides a broad prospect for quantitative evaluation of soil properties. At present, only a few studies were associated with red edge spectral index, vegetation index and topographic index in soil salinization mapping, and it has become a great challenge to choose the best modeling technology in soil mapping for a specific landscape area, although many algorithm have been successfully applied in the prediction of soil properties. Therefore, in this study, we used Sentinel-2A red-edge bands, vegetation indexes and digital elevation model (DEM) derived variables to conduct soil salt analysis based on machine learning methods in the Ebinur Lake wetland in the northwestern Xinjiang of China. 24 red edge spectral indices, 11 vegetation indices and 8 topographic indices were selected to participate in the modeling by the XGBoost algorithm, and the Random Forest (RF), Extreme Learning Machine (ELM) and Partial Least Squares Regression (PLSR) three machine learning models based on 78 sampling sites were applied to extract soil Electrical Conductivity (EC). The coefficient of determination (R2), root mean square error (RMSE) and ratio of performance to deviation (RPD) were used to evaluate the prediction accuracy of the above models. The results showed that the optimal red edge spectral index combined with RF could basically predict EC. The verification set R2, RMSE, and RPD were 0.63, 7.14 dS/m, and 2.09, respectively. The prediction accuracy of the combined modeling of the red edge spectral index and the vegetation index is better than that of the combination with the terrain index, and the prediction effect of the RF model was better than that of ELM and PLSR, and its training set (R2=0.83, RMSE=4.84 dS/m), validation set (R2=0.76, RMSE=5.36 dS/m, RPD=2.79). The prediction accuracy of the combined modeling of the red edge spectral index, vegetation index and terrain index combined with RF reached the best. The R2, RMSE and RPD of the verification set were 0.83, 4.81 dS/m and 3.11, respectively. In addition, with the continuous increase of input feature variables, the prediction effect of each model were improved to varying degrees. Soil salinization mapping based on the optimal variable combination (red edge spectral index + terrain index + vegetation index) and the best prediction model (RF), showed that the degree of soil salinization in the central and eastern regions was particularly serious in the study area.

       

    /

    返回文章
    返回