徐艳蕾, 孔朔琳, 陈清源, 高志远, 李陈孝. 基于Transformer的强泛化苹果叶片病害识别模型[J]. 农业工程学报, 2022, 38(16): 198-206. DOI: 10.11975/j.issn.1002-6819.2022.16.022
    引用本文: 徐艳蕾, 孔朔琳, 陈清源, 高志远, 李陈孝. 基于Transformer的强泛化苹果叶片病害识别模型[J]. 农业工程学报, 2022, 38(16): 198-206. DOI: 10.11975/j.issn.1002-6819.2022.16.022
    Xu Yanlei, Kong Shuolin, Chen Qingyuan, Gao Zhiyuan, Li Chenxiao. Model for identifying strong generalization apple leaf disease using Transformer[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(16): 198-206. DOI: 10.11975/j.issn.1002-6819.2022.16.022
    Citation: Xu Yanlei, Kong Shuolin, Chen Qingyuan, Gao Zhiyuan, Li Chenxiao. Model for identifying strong generalization apple leaf disease using Transformer[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(16): 198-206. DOI: 10.11975/j.issn.1002-6819.2022.16.022

    基于Transformer的强泛化苹果叶片病害识别模型

    Model for identifying strong generalization apple leaf disease using Transformer

    • 摘要: 模型泛化能力是病害识别模型多场景应用的关键,该研究针对不同环境下的苹果叶片病害数据,提出一种可以提取多类型特征的强泛化苹果叶片病害识别模型CaTNet。该模型采用双分支结构,首先设计了一种卷积神经网络分支,负责提取苹果叶片图像的局部特征,其次构建了具有挤压和扩充功能的视觉Transformer分支,该分支能够提取苹果叶片图像的全局特征,最后将两种特征进行融合,使Transformer分支可以学习局部特征,使卷积神经网络分支学习全局特征。与多种卷积神经网络模型和Transformer模型相比,该模型具有更好的泛化能力,仅需学习实验室环境叶片数据,即可在自然环境数据下达到80%的识别精度,相较卷积神经网络EfficientNetV2的72.14%精度和Transformer网络PVT的52.72%精度均有较大提升,能够有效提升对不同环境数据的识别精度,解决了深度学习模型训练成本高,泛化能力弱的问题。

       

      Abstract: Abstract: Apple diseases have pose a serious risk on the income of orchards in recent years. An accurate and rapid identification of apple diseases can be great benefit to better prevent and control diseases. Most effort has been made in the laboratory to train the identification model, due mainly to the limited condition for the deliberately infect apples in the real orchard. However, most models cannot fully meet the requirement of the disease detection in the large-scale production. In this study, a deep learning model (called CaTNet) was proposed to extract both the global and local information from the diseases of apple leaf. The image data of disease was collected from the apple orchards in the Jilin Province of China. A total of 16,464 images were obtained from the several publicly available datasets with the laboratory and natural environmental data collected from the field. Firstly, a model structure was constructed with both Transformer and convolutional neural network (CNN). Global and local information was extracted from the original images using the two branches. The strong generalization ability of the model was improved to learn a wider variety of features. Meanwhile, the global features were acquired to improve the resistance of the model to interference. Secondly, the Transformer block in the Transformer branch was optimized to make the structure simpler. In addition, a channel compression and expansion module was designed in the Transformer branch, in order to reduce the training cost of CaTNet for the less channel dimension of the input features. Afterwards, the multiple multilayer perceptrons were replaced by the grouped convolutional layers to further improve the computational speed of the model. Thirdly, the lightweight CNN branch was constructed with an inverse residual structure to fuse the point convolution of the expanded channels with the 3×3 convolution of the extracted information. The CNN branch was utilized to extract the local features of the image. As such, the model was more sensitive to the fine-grained features. Finally, the concat operation was implemented to fuse the different output of features from the two branches. After that, the CNN branch was selected to extract the local features from the global ones, whereas, the Transformer branch was extracted the global from the local. The multiple features to be cycled were also improved the generalization of the model. A comparison was made to clarify the effect of different down-sampling on the two-branch network. Specifically, an accuracy rate of 79.35%, 74.06% and 67.95% were obtained using pooling, 3×3 size convolution kernel, and 1×1 size convolution kernel for the down-sampling, respectively. The CaTNet model with two branches showed a computational speed of 0.108 2 s/Frame), which was faster than the various deep learning models, such as the EfficientNetV2 s (0.383 2 s/Frame) and PVT t (0.177 8 s/Frame). Consequently, the two-branch structure can be expected to accommodate more computation for the much higher computational speed. This finding can provide a design approach to build the deep learning models with the high generalization capability, particularly on the training with the high accuracy under only easily accessible data.

       

    /

    返回文章
    返回