基于多层级特征筛选和无人机影像的冬小麦植株氮含量预测

郭燕; 王来刚; 贺佳; 井宇航; 宋晓宇; 张彦; 刘婷

doi:10.11975/j.issn.1002-6819.202312158

基于多层级特征筛选和无人机影像的冬小麦植株氮含量预测

Predicting nitrogen content in winter wheat plants using multi-level sensitive feature filtering and UAV imagery

摘要

摘要: 氮素是冬小麦生长发育必不可少的大量元素，无人机超高分辨率影像丰富的光谱信息和纹理信息为冬小麦植株氮含量精准预测提供了重要的技术途径，但是过多变量造成了信息冗余和模型复杂的问题。针对此问题，该研究提出了一种“相关分析+共线性分析+LASSO（least absolute shrinkage and selection operator）特征筛选”的多层级植株氮含量敏感特征的筛选方法，引入约束系数向量的L1正则化实现特征的稀疏性，将某些特征的系数缩小为0，基于冬小麦关键生育期（拔节期、孕穗期、开花期、灌浆期）无人机影像提取的65个光谱和纹理特征，采BP神经网络（back propagation，BP）、Adaboost、随机森林（random forest，RF）和线性回归（linear regression，LR）4种机器学习算法构建了冬小麦植株氮含量预测模型。结果表明：相关分析筛选出51个通过0.01显著性检验的变量；基于共线性分析，当LASSSO正则化参数λ取值为0.08时， 17个敏感特征变量被筛选。基于筛选的敏感特征变量，BP、Adaboost、RF和LR 4种算法建立的植株氮含量预测模型均达到了0.01水平差异显著性，且BP、Adaboost和RF 3种预测模型的精度具有高度的一致性，模型R²均为0.81，RMSE分别为0.36%、0.38%和0.37%，说明该研究提出的多层级特征筛选方法不仅使得模型变得简洁，而且稳健性高，可为智慧农业氮肥精准监测、智慧管理提供技术支撑。

Abstract: Nitrogen is one of the most essential elements for the growth and development of winter wheat. Unmanned aerial vehicle (UAV) imagery can be expected to accurately predict the nitrogen content in winter wheat plants, due to the spectrum and texture with the ultra-high resolution. However, too many variables have resulted in the parameters redundancy and the complex structure of model. Therefore, it is very necessary to screen the model-sensitive variables for the high accuracy, in order to improve the computational efficiency of the simple model. In this study, a multi-level filtering of sensitive feature was proposed, namely, "correlation analysis + covariance analysis + LASSO (least absolute shrinkage and selection operator) feature screening", which introduced L1 regularization of the constraint coefficient vectors to achieve sparsity of the features, and reduced the coefficients of some features to zero. A total of 65 feature variables were obtained using the spectral and texture information that extracted from the UAV images of winter wheat during the critical growth stages (jointing, booting, flowering, and filling stages). 65 feature variables were firstly correlated with the nitrogen content of the plant. 51 feature variables were retained to pass the 0.01 level of significance. After collinearity analysis, the LASSO was introduced to filter the feature variables, in order to eliminate the collinearity among the feature variables. Back-propagation (BP) neural network, adaboost, random forest (RF) and linear regression (LR) were used to predict the nitrogen content of winter wheat plants. Once the value of LASSO regularization parameter λ was 0.08, 17 sensitive feature variables were screened. The predictive models were then established for the plant nitrogen content. All BP, Adaboost, RF, and LR reached the significant differences at the 0.01 level. The accuracies of BP, Adaboost, and RF were well consistent with the model R² of 0.81 and the root mean square errors (RMSEs) of 0.36%, 0.38%, and 0.37%, respectively. The R² and RMSE were evaluated on the training, cross-validation, and test models without screening LASSO feature variables. There was the increase R² value of the 51-variable training dataset, compared with 17 variables. The R² values varied significantly in the cross-validation dataset. Specifically, the R² values of RF and LR models decreased, Adaboost remained stable, and LR increased, indicating the better performance without more variables. Since the model variables were reduced by 2/3 after LASSO feature screening, there was the limited impact on the prediction accuracy of plant nitrogen content. In summary, the 17 variables of sensitive feature can be used to predict the plant nitrogen content after LASSO feature screening using machine learning, according to the timeliness of training and the requirement of computing power. The prediction models with RF and Adaboosts were achieved in the higher accuracy of plant nitrogen content. Therefore, the less variables can be feasible with the limited computing power, when performing the prediction model of crop growth parameters. More importantly, the sensitive feature variables can be selected to best characterize the crop growth parameters, indicating the concise, convenient, efficient, robust and highly usable model. The multi-level screening of sensitive feature can provide the practical support to the nitrogen precision monitoring and management in smart agriculture.

HTML全文

参考文献(40)

施引文献

资源附件(0)