模型-交叉验证耦合的无人机多光谱草原地上生物量估算

    Estimating above-ground biomass in grassland using model-cross validation coupling UAV multispectral imaging

    • 摘要: 针对温带草原退化导致的草地资源监测需求,该研究提出基于无人机多光谱遥感与机器学习的地上生物量(above ground biomass,AGB)估算方法。以内蒙古锡林浩特典型草原为试验区,通过无人机获取红、绿、近红外等多光谱影像,结合110组有效地面实测数据,提取归一化植被指数(normalized difference vegetation index,NDVI)、优化土壤调节植被指数(optimized soil-adjusted vegetation index,OSAVI)、Red单波段反射值及色彩空间特征参量(green pixel identification,GPI)作为特征参数。对比分析多元线性回归(multiple linear regression,MLR)、偏最小二乘法回归(partial least squares regression,PLSR)、反向传播神经网络(back propagation neural network,BPNN)、极端梯度提升树(extreme gradient boosting,XGBoost)和随机森林(random forest,RF)5种机器学习模型在3种交叉验证方法(K折交叉验证法、蒙特卡洛交叉验证法和留一法)下的精度、泛化能力和计算效率。结果表明,RF模型结合蒙特卡洛交叉验证表现最优,验证集决定系数达0.76,均方根误差为18.61 g/m2,平均绝对误差为18.37 g/m2,较最优线性模型(MLR-Monte Carlo CV)的均方根误差降低7.6%,训练时间仅为0.63 s。且该组合通过高频随机抽样能有效抑制模型偏差,使RF在全量程(0~300 g/m2)的AGB估算中保持稳定性,拟合散点更贴近1:1线。研究表明,该研究基于无人机多光谱遥感数据构建的RF-Monte Carlo CV模型组合能有效估算温带典型草原AGB,可为生态脆弱区草地资源的动态监测、精准管理及退化评估提供技术支撑。

       

      Abstract: Grasslands can play a critical role in the biodiversity, climate, and essential resources in ecosystems. However, the typical grasslands have suffered from severe degradation, due to overgrazing, climate change, and human activities. There is an urgent need for efficient and accurate surveys of the grassland resources in recent years. This study aimed to acquire the multispectral remote sensing data into advanced machine learning using unmanned aerial vehicles (UAV). The optimal spectral parameters were also combined to accurately estimate the above-ground biomass (AGB) in the temperate grasslands. A field test was conducted in the Baiyin Xile Pasture, the Hoohhot League, Inner Mongolia, China. The UAVs were used to capture the high-resolution multispectral images. While the field-sampled AGB data was collected to serve as the ground truth for the model training and validation. Two optimal indices of the vegetation (red single-band spectrum and color space feature parameters) were extracted to select as the input features for the machine learning model. Different models of machine learning were selected to estimate the AGB, including multiple linear regression (MLR), partial least squares regression (PLSR), back propagation neural network (BPNN), extreme gradient boosting (XGBoost), and random forest (RF). Three cross-validation methods were also employed to compare the models: stratified k-fold cross-validation (Stratified k-Fold CV), Monte Carlo cross-validation (Monte Carlo CV), and leave-one-out cross-validation (LOO CV). A systematic comparison was made to evaluate the accuracy, generalization, and computational efficiency of the models. The performance of the models was quantified by the root mean squared error (RMSE), mean absolute error (MAE), computational efficiency (training time), and coefficient of determination ((R2). The optimal combination of the parameters was finally identified to accurately estimate the AGB in the temperate grasslands after different validations. The results demonstrated that the RF-Monte Carlo CV combination shared superior performance across all metrics, with a validation set (R2 of 0.76, RMSE of 18.61 g/m2, and MAE of 18.37 g/m2. There was a 7.6% reduction in the RMSE, compared with the optimal linear model (MLR-Monte Carlo CV: RMSE=20.14 g/m2, and MAE=18.26 g/m2). While there was a 3.8% improvement in (R2 over PLSR-Monte Carlo CV ((R2=0.74, and RMSE=20.71 g/m2). Linear models (MLR and PLSR) showed stable performance after validation, with the R2 ranging from 0.72-0.73 and RMSE from 19.24-21.84 g/m2. However, it failed to capture the complex spectral-biomass relationships. Among the nonlinear models, the XGBoost-Monte Carlo CV was achieved in the lowest training set of the RMSE (17.84 g/m2). But there was overfitting in the validation set (RMSE=18.32 g/m2, and MAE=18.68 g/m2). BPNN-Monte Carlo CV showed promising training performance (RMSE=18.58 g/m2, and (R2=0.83), but failed to generalize (validation RMSE=19.60 g/m2, and MAE=18.61 g/m2), and then required for the 23.27 s/iteration. Furthermore, the RF outperformed all models in computational efficiency, requiring only 0.63 s/iteration 97% and 85% faster than those of the BPNN and XGBoost-LOO CV (4.37 s/iteration), respectively. Its ensemble learning mechanism was maintained on the prediction stability across the full AGB range (0-300 g/m2), with the validation RMSE deviations < 1.2 g/m2 across all quantiles. Monte Carlo CV reduced the validation RMSE by 6.1%, compared with the stratified k-fold CV for the nonlinear models, indicating its effectiveness in mitigating the grouping bias. While the unbiased LOO CV was computationally infeasible for large datasets (RF-LOO CV required 15.83 s/iteration). A robust framework was established to integrate the UAV multispectral data and RF-Monte Carlo CV for the AGB estimation in temperate grasslands. The high accuracy (R2=0.71, rapid computation, and adaptability were achieved to apply to the complex vegetation structures. The combination of the RF-Monte Carlo CV model on the UAV multi-spectral remote sensing data can effectively estimate the AGB of the typical temperate grasslands. The finding can also provide the technical support to rapidly assess the grassland resources.

       

    /

    返回文章
    返回