融合Transformer与原型自监督的苹果叶部病害识别

李大湘; 张雯凯; 刘颖

doi:10.11975/j.issn.1002-6819.202405187

摘要: 为了缓解苹果叶部病害（apple leaf diseases，ALD）识别存在“类内差异大、类间差异小”的问题，该研究设计了一种融合Transformer与原型自监督（fusion transformer and prototype self-supervised, FTPSS）的模型，以进一步提高ALD识别精度。首先，利用Resnet50作为骨干网络，以提取ALD图像的多级特征图谱；然后，构造了一个简化自注意力（simplified self-attention，SSA）机制，且将其与空间注意力引导可变形卷积（spatial attention guided deformable convolution，SAG-DC）相结合，设计了一种简化自注意力可变卷积Transformer（simplified self-attention and deformable convolution transformer，SSADC-TF）编码器，用于对主干网络提取的多级特征图谱进行交互融合，以增强模型对ALD图像中不规则病变区域的感知能力；最后，构造了一个原型自监督（prototype self-supervised, PSS）学习模块，即通过构造“正交”与“聚集”二个自监督损失函数用于约束模型的训练，以缓解ALD图像识别中存在的语义鸿沟问题。基于标准图像集与真实图像集的对比试验结果表明，ALD图像经SSADC-TF逐层特征融合与PSS学习之后，FTPSS模型识别精度分别达到98.61%与98.73%，较基线模型分别提高5.15与4.49个百分点，能够满足智慧农业ALD识别的应用需求。

Abstract: Apple leaf diseases (ALD) identification can be characterized by "significant intra-class variation and subtle inter-class differences". In this study, an innovative model was presented to integrate transformer with prototype self-supervised (FTPSS) learning. This improved model aimed to significantly elevate the precision of ALD recognition, thereby enhancing disease management strategies in orchards. The ResNet50 was utilized as the backbone network in the FTPSS model. This robust architecture was employed to extract multi-level feature maps from ALD images, in order to capture the intricate details for accurate disease identification. An encoder design was also integrated a simplified self-attention (SSA) mechanism with spatial attention guided deformable convolution (SAG-DC). The simplified self-attention and deformable convolution transformer (SSADC-TF) was used to facilitate the effective interaction and fusion of multi-level feature maps. The extracted features were then processed. The sensitivity of model was enhanced for the irregular lesion areas within ALD images. SSADC-TF was significantly distinguished among different disease manifestations. A prototype self-supervised (PSS) learning module was introduced to further verify the performance of model. Two self-supervised loss functions: "Orthogonality" and "Clustering" were selected in the module. In the "Orthogonality" loss, the feature representations of different ALD classes were orthogonal to each other. A clear separation among classes was promoted to enhance the identification of the model. Meanwhile, the "Clustering" loss was used to tighten the intra-class compactness, thus ensuring that the variations within the same class was suitable for the robustness of the model. Extensive experiments were conducted on both standard and real-world image datasets, indicating the remarkable effectiveness of FTPSS model. The FTPSS model was achieved in a recognition accuracy of 98.61% on the standard image set, indicating a significant improvement of 5.15 percentage points over the baseline model. Similarly, the FTPSS model was obtained an accuracy of 98.73% on the real-world image set, indicating an enhancement of 4.49 percentage points, compared with the baseline. These results underscored the robust performance of FTPSS model to identify ALD, even in the presence of significant intra-class variation and subtle inter-class differences. The FTPSS model was attributed to the innovative integration of Transformer with Prototype Self-Supervised learning. There were the powerful feature extraction of ResNet50. SSADC-TF was also enhanced feature interaction and fusion. The complex details in ALD images were captured to achieve in a 2.40 percentage point improvement. Furthermore, the PSS learning module was introduced to mitigate the semantic gap, where the model was generalized well to new, unseen ALD cases. The accuracy of ALD image recognition increased by 2.69 percentage points. In conclusion, the FTPSS model shared a significant advancement in ALD recognition, with the potential to revolutionize disease management strategies in orchards. The precise, timely information can be expected to apply into the automatic process of disease detection ALD, thereby preserving the health and productivity of the orchards. This finding can greatly contribute to the field of precision agriculture using advanced deep learning techniques.

融合Transformer与原型自监督的苹果叶部病害识别

Identifying apple leaf diseases by integrating transformer and prototype self-supervised learning