Abstract:
Grasslands can play a critical role in the biodiversity, climate, and essential resources in ecosystems. However, the typical grasslands have suffered from severe degradation, due to overgrazing, climate change, and human activities. There is an urgent need for efficient and accurate surveys of the grassland resources in recent years. This study aimed to acquire the multispectral remote sensing data into advanced machine learning using unmanned aerial vehicles (UAV). The optimal spectral parameters were also combined to accurately estimate the above-ground biomass (AGB) in the temperate grasslands. A field test was conducted in the Baiyin Xile Pasture, the Hoohhot League, Inner Mongolia, China. The UAVs were used to capture the high-resolution multispectral images. While the field-sampled AGB data was collected to serve as the ground truth for the model training and validation. Two optimal indices of the vegetation (red single-band spectrum and color space feature parameters) were extracted to select as the input features for the machine learning model. Different models of machine learning were selected to estimate the AGB, including multiple linear regression (MLR), partial least squares regression (PLSR), back propagation neural network (BPNN), extreme gradient boosting (XGBoost), and random forest (RF). Three cross-validation methods were also employed to compare the models: stratified k-fold cross-validation (Stratified k-Fold CV), Monte Carlo cross-validation (Monte Carlo CV), and leave-one-out cross-validation (LOO CV). A systematic comparison was made to evaluate the accuracy, generalization, and computational efficiency of the models. The performance of the models was quantified by the root mean squared error (RMSE), mean absolute error (MAE), computational efficiency (training time), and coefficient of determination ((
R2). The optimal combination of the parameters was finally identified to accurately estimate the AGB in the temperate grasslands after different validations. The results demonstrated that the RF-Monte Carlo CV combination shared superior performance across all metrics, with a validation set (
R2 of 0.76, RMSE of 18.61 g/m
2, and MAE of 18.37 g/m
2. There was a 7.6% reduction in the RMSE, compared with the optimal linear model (MLR-Monte Carlo CV: RMSE=20.14 g/m
2, and MAE=18.26 g/m
2). While there was a 3.8% improvement in (
R2 over PLSR-Monte Carlo CV ((
R2=0.74, and RMSE=20.71 g/m
2). Linear models (MLR and PLSR) showed stable performance after validation, with the
R2 ranging from 0.72-0.73 and RMSE from 19.24-21.84 g/m
2. However, it failed to capture the complex spectral-biomass relationships. Among the nonlinear models, the XGBoost-Monte Carlo CV was achieved in the lowest training set of the RMSE (17.84 g/m
2). But there was overfitting in the validation set (RMSE=18.32 g/m
2, and MAE=18.68 g/m
2). BPNN-Monte Carlo CV showed promising training performance (RMSE=18.58 g/m
2, and (
R2=0.83), but failed to generalize (validation RMSE=19.60 g/m
2, and MAE=18.61 g/m
2), and then required for the 23.27 s/iteration. Furthermore, the RF outperformed all models in computational efficiency, requiring only 0.63 s/iteration 97% and 85% faster than those of the BPNN and XGBoost-LOO CV (4.37 s/iteration), respectively. Its ensemble learning mechanism was maintained on the prediction stability across the full AGB range (0-300 g/m
2), with the validation RMSE deviations < 1.2 g/m
2 across all quantiles. Monte Carlo CV reduced the validation RMSE by 6.1%, compared with the stratified k-fold CV for the nonlinear models, indicating its effectiveness in mitigating the grouping bias. While the unbiased LOO CV was computationally infeasible for large datasets (RF-LOO CV required 15.83 s/iteration). A robust framework was established to integrate the UAV multispectral data and RF-Monte Carlo CV for the AGB estimation in temperate grasslands. The high accuracy (
R2=0.71, rapid computation, and adaptability were achieved to apply to the complex vegetation structures. The combination of the RF-Monte Carlo CV model on the UAV multi-spectral remote sensing data can effectively estimate the AGB of the typical temperate grasslands. The finding can also provide the technical support to rapidly assess the grassland resources.