Abstract:
Nitrogen is one of the most essential elements for the growth and development of winter wheat. Unmanned aerial vehicle (UAV) imagery can be expected to accurately predict the nitrogen content in winter wheat plants, due to the spectrum and texture with the ultra-high resolution. However, too many variables have resulted in the parameters redundancy and the complex structure of model. Therefore, it is very necessary to screen the model-sensitive variables for the high accuracy, in order to improve the computational efficiency of the simple model. In this study, a multi-level filtering of sensitive feature was proposed, namely, "correlation analysis + covariance analysis + LASSO (least absolute shrinkage and selection operator) feature screening", which introduced L1 regularization of the constraint coefficient vectors to achieve sparsity of the features, and reduced the coefficients of some features to zero. A total of 65 feature variables were obtained using the spectral and texture information that extracted from the UAV images of winter wheat during the critical growth stages (jointing, booting, flowering, and filling stages). 65 feature variables were firstly correlated with the nitrogen content of the plant. 51 feature variables were retained to pass the 0.01 level of significance. After collinearity analysis, the LASSO was introduced to filter the feature variables, in order to eliminate the collinearity among the feature variables. Back-propagation (BP) neural network, adaboost, random forest (RF) and linear regression (LR) were used to predict the nitrogen content of winter wheat plants. Once the value of LASSO regularization parameter
λ was 0.08, 17 sensitive feature variables were screened. The predictive models were then established for the plant nitrogen content. All BP, Adaboost, RF, and LR reached the significant differences at the 0.01 level. The accuracies of BP, Adaboost, and RF were well consistent with the model
R2 of 0.81 and the root mean square errors (RMSEs) of 0.36%, 0.38%, and 0.37%, respectively. The
R2 and RMSE were evaluated on the training, cross-validation, and test models without screening LASSO feature variables. There was the increase
R2 value of the 51-variable training dataset, compared with 17 variables. The
R2 values varied significantly in the cross-validation dataset. Specifically, the
R2 values of RF and LR models decreased, Adaboost remained stable, and LR increased, indicating the better performance without more variables. Since the model variables were reduced by 2/3 after LASSO feature screening, there was the limited impact on the prediction accuracy of plant nitrogen content. In summary, the 17 variables of sensitive feature can be used to predict the plant nitrogen content after LASSO feature screening using machine learning, according to the timeliness of training and the requirement of computing power. The prediction models with RF and Adaboosts were achieved in the higher accuracy of plant nitrogen content. Therefore, the less variables can be feasible with the limited computing power, when performing the prediction model of crop growth parameters. More importantly, the sensitive feature variables can be selected to best characterize the crop growth parameters, indicating the concise, convenient, efficient, robust and highly usable model. The multi-level screening of sensitive feature can provide the practical support to the nitrogen precision monitoring and management in smart agriculture.