Abstract:
Apple leaf diseases (ALD) identification can be characterized by "significant intra-class variation and subtle inter-class differences". In this study, an innovative model was presented to integrate transformer with prototype self-supervised (FTPSS) learning. This improved model aimed to significantly elevate the precision of ALD recognition, thereby enhancing disease management strategies in orchards. The ResNet50 was utilized as the backbone network in the FTPSS model. This robust architecture was employed to extract multi-level feature maps from ALD images, in order to capture the intricate details for accurate disease identification. An encoder design was also integrated a simplified self-attention (SSA) mechanism with spatial attention guided deformable convolution (SAG-DC). The simplified self-attention and deformable convolution transformer (SSADC-TF) was used to facilitate the effective interaction and fusion of multi-level feature maps. The extracted features were then processed. The sensitivity of model was enhanced for the irregular lesion areas within ALD images. SSADC-TF was significantly distinguished among different disease manifestations. A prototype self-supervised (PSS) learning module was introduced to further verify the performance of model. Two self-supervised loss functions: "Orthogonality" and "Clustering" were selected in the module. In the "Orthogonality" loss, the feature representations of different ALD classes were orthogonal to each other. A clear separation among classes was promoted to enhance the identification of the model. Meanwhile, the "Clustering" loss was used to tighten the intra-class compactness, thus ensuring that the variations within the same class was suitable for the robustness of the model. Extensive experiments were conducted on both standard and real-world image datasets, indicating the remarkable effectiveness of FTPSS model. The FTPSS model was achieved in a recognition accuracy of 98.61% on the standard image set, indicating a significant improvement of 5.15 percentage points over the baseline model. Similarly, the FTPSS model was obtained an accuracy of 98.73% on the real-world image set, indicating an enhancement of 4.49 percentage points, compared with the baseline. These results underscored the robust performance of FTPSS model to identify ALD, even in the presence of significant intra-class variation and subtle inter-class differences. The FTPSS model was attributed to the innovative integration of Transformer with Prototype Self-Supervised learning. There were the powerful feature extraction of ResNet50. SSADC-TF was also enhanced feature interaction and fusion. The complex details in ALD images were captured to achieve in a 2.40 percentage point improvement. Furthermore, the PSS learning module was introduced to mitigate the semantic gap, where the model was generalized well to new, unseen ALD cases. The accuracy of ALD image recognition increased by 2.69 percentage points. In conclusion, the FTPSS model shared a significant advancement in ALD recognition, with the potential to revolutionize disease management strategies in orchards. The precise, timely information can be expected to apply into the automatic process of disease detection ALD, thereby preserving the health and productivity of the orchards. This finding can greatly contribute to the field of precision agriculture using advanced deep learning techniques.