扶兰兰, 黄昊, 王恒, 黄胜操, 陈度. 基于Swin Transformer模型的玉米生长期分类[J]. 农业工程学报, 2022, 38(14): 191-200. DOI: 10.11975/j.issn.1002-6819.2022.14.022
    引用本文: 扶兰兰, 黄昊, 王恒, 黄胜操, 陈度. 基于Swin Transformer模型的玉米生长期分类[J]. 农业工程学报, 2022, 38(14): 191-200. DOI: 10.11975/j.issn.1002-6819.2022.14.022
    Fu Lanlan, Huang Hao, Wang Heng, Huang Shengcao, Chen Du. Classification of maize growth stages using the Swin Transformer model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(14): 191-200. DOI: 10.11975/j.issn.1002-6819.2022.14.022
    Citation: Fu Lanlan, Huang Hao, Wang Heng, Huang Shengcao, Chen Du. Classification of maize growth stages using the Swin Transformer model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(14): 191-200. DOI: 10.11975/j.issn.1002-6819.2022.14.022

    基于Swin Transformer模型的玉米生长期分类

    Classification of maize growth stages using the Swin Transformer model

    • 摘要: 快速准确识别玉米生长的不同阶段,对于玉米种植周期的高效精准管理具有重要意义。针对大田环境下玉米生长阶段分类辨识易受复杂背景、户外光照等因素影响的问题,该研究采用无人机获取玉米不同生长阶段的图像信息,以苗期、拔节期、小喇叭口期、大喇叭口期4个生长阶段为对象,利用Swin Transformer(Swin-T)模型引入迁移学习实现玉米不同生长阶段的快速识别。首先结合玉米垄面走向特性,将训练集旋转8次用以扩充数据集;为探究各模型在非清晰数据集上的表现,采用高斯模糊方法将测试集转换6次;最后以AlexNet,VGG16,GoogLeNet做为对比,评估Swin-T模型性能。试验结果表明,Swin-T模型在原始测试集的总体准确率为98.7%,相比于AlexNet,VGG16,GoogLeNet模型分别高出6.9、2.7和2.0个百分点;在错误分类中,大喇叭口期和小喇叭口期由于冠层特征相似,造成识别错误的概率最大;在非清晰数据集下,AlexNet,VGG16,GoogLeNet模型精度总体退化指数分别为12.4%、10.4%和15.0%,Swin-T模型总体退化指数为8.31%,并且退化均衡度、平均退化指数、最大退化准确率均表现最佳。研究结果表明:在分类精度、模糊图像输入等方面,Swin-T模型能够较好地满足实际生产中玉米不同生长阶段分类识别的需求,可为玉米生长阶段的智能化监测提供技术支撑。

       

      Abstract: Abstract: A rapid and accurate identification of maize growth stages is of great significance for the precise management of the corn planting cycle. However, the complex background and outdoor lighting can be posed a great challenge to the current classification and identification of the corn growth stage in the field. In this study, an unmanned aerial vehicle (UAV) was adopted to capture the images of maize at the four growth stages, including the seedling, jointing, small-trumpet, and big-trumpet stages. A Swin Transformer model was also used to introduce transfer learning for the rapid identification of maize at different growth stages. The high recognition rate was significantly enhanced to monitor the large area than before. The generalization ability of the model was greatly improved to consider a single angle for image acquisition by the drone. The reason was that there was a consistent distribution of ridge growth in the same period, where there were different shapes and colors of the canopy in the four growth periods of maize. The deep learning model was then promoted to classify the corn ridge orientation as a feature. The data set used in the model training phase was usually collected in a stable and high-quality environment. In addition to making the images clear using algorithms, the anti-interference ability of the model was modified to reduce the performance degradation caused by the image blur in the task of recognition. First, the training set was rotated 8 times to expand the data set, combined with the corn ridge surface orientation. The shooting angle of the drone was simulated to improve the generalization ability of the model. Second, the Gaussian fuzzy method was used to convert the test set 6 times. The aHash, pHash, and dHash histogram algorithms were selected to evaluate the similarity between the actual and Gaussian blurred images, in order to explore the performance of each model on the unclear data set. At the same time, a comprehensive evaluation index was constructed for the performance degradation degree of the model under different definitions using accuracy. Finally, the Swin-T model performance was verified to compare the classical convolutional neural network (CNN) with the AlexNet, VGG16, and GoogLeNet. The experiments demonstrate that the overall accuracy of the Swin-T model in the original test set was 98.7%, which was 6.9, 2.7, and 2.0 percentage points higher than the AlexNet, VGG16, and GoogLeNet models, respectively. The complex background of the field posed a certain impact on the recognition accuracy of the seedling and jointing stage in the misclassification. Since the phenotype of the weed was similar to that of the maize, the images with the true category of the seedling stage were misidentified as the jointing. There were large areas of weeds in the images at the jointing stage, whereas, the maize canopy features were not outstanding. The images whose real category was the jointing stage were misidentified as the seedling stage. The overall accuracies of the degradation index were 12.38%, 10.38%, and 15.03% for the AlexNet, VGG16, and GoogLeNet models under the unclear data set, respectively. The overall degradation index of the Swin-T model was only 8.31%, indicating all the best in the degradation balance, the average degradation index, and the accuracy rates of the maximum degradation. It infers that the Swin-T model was degraded the least when the image quality was degraded. In terms of classification accuracy, and fuzzy image input, the Swin-T model can be expected to fully meet the harsh needs of the classification and identification at the different growth stages of maize in the actual production. The finding can also provide promising technical support for the intelligent identification and monitoring of maize growth stages.

       

    /

    返回文章
    返回