Abstract:
Crop diseases seriously affect grain yield and quality. Timely and accurately identifying crop diseases can effectively prevent disease spread and reduce losses. Existing crop disease identification methods face two main challenges: 1) Much noise during image acquisition easily reduces the model's identification accuracy. 2) Complex backgrounds of disease images captured in natural environments and subtle distinctions between disease areas and backgrounds seriously affect the model's accuracy and generalizability. To address these issues, this paper proposed a disease identification method using adaptive BayesShrink and frequency-spatial domain features fusion (AFSF-DCT) for crop disease identification in complex backgrounds, including two parts: disease image denoising and identification. Traditional denoising algorithms tend to lose most of the details while removing image noise. Therefore, they are less applicable to the image-denoising task with complex backgrounds. To solve this problem, this paper designed an adaptive BayesShrink denoising algorithm (Ad-BayesShrink) based on adaptive global and local thresholds. Ad-BayesShrink minimizes noise interference while retaining more detailed information, reducing the model's difficulty extracting disease features. It uses Daubechies8 Discrete Wavelet Transform (Db8 DWT) to capture the texture and color details of different disease regions from the frequency domain. Large-scale low-frequency sub-bands were denoised using global thresholding, while high-frequency sub-bands were processed using local thresholding to retain image details and edge information. Denoised low-frequency and high-frequency components were reconstructed using Inverse Discrete Wavelet Transform. AFSF-DCT designed a crop disease identification model (FSF-DCT) using frequency-spatial feature fusion and dynamic cross-self-attention to identify crop leaf disease with complex backgrounds. FSF-DCT takes MobileNetV3 as its backbone, containing frequency-spatial features mapping and fusion. A frequency-spatial domain feature mapping (DWT-Bneck) branch based on DWT and Inverse Residual Structure (Bneck) was proposed for capturing multi-scale features from both frequency-spatial domains. In the frequency-domain feature mapping branch, a frequency decomposition module (DWFD) utilized 2D DWT and Bottleneck modules to capture disease image details and texture features, compensating for the insufficiency of spatial-domain information expressing global features. Based on Bneck structure and CBAM (Bneck-CBAM), the spatial domain branch enhanced FSF-DCT's capacity for feature representation in both channel and spatial dimensions. It enables FSF-DCT to capture long-range dependencies in spatial directions and more precise positional information of disease features, thus realizing comprehensive spatial feature mapping. To further improve FSF-DCT's nonlinearity and ability to learn disease features, the Dynamic Shift Max activation function was embedded in Bneck-CBAM to replace the ReLU6 activation function. Finally, a Dynamic Cross Self-Attention Feature Fusion Network (MDCS-DF) was designed to fuse multi-scale frequency-spatial domain features and enhance FSF-DCT's focus on disease features. MDCS-DF uses a series of horizontal and vertical convolutions to align the frequency-spatial domain features to a uniform scale. Dynamic cross-self-attention assigns varying weights to disease features based on their attributes, thus efficiently fusing multi-scale frequency-spatial features while enhancing FSF-DCT's focus on disease regions and reducing the influence of complex background on disease identification. The denoising experiments showed that Ad-BayesShrink outperformed VisuShrink and SUREShrink with a higher PNSR (32.26), SSIM (0.98), and a lower MSE (125.80). These results demonstrated that Ad-BayesShrink effectively removed image noise while preserving as much detailed information as possible, such as the image's texture and edges. Ad-BayesShrink still achieved a higher PNSR (31.39) under low-light conditions, which indicated that it could effectively deal with the effect of low light on the disease image in practical applications. Experiments on the self-built dataset showed that FSF-DCT achieved an identification accuracy of 99.20% and a precision of 98.89%, outperforming most classical models and state-of-the-art. These results highlighted FSF-DCT's superior ability to accurately identify crop leaf diseases under complex backgrounds in natural environments. Meanwhile, FSF-DCT's interference time was only 6ms, making it suitable for deployment on resource-constrained devices. Generalizability experiments further showed that FSF-DCT achieved the highest identification accuracies on the PlantVillage (99.90%) and AI Challenger 2018 (90.75%) datasets compared to three mainstream models (MobileNetV3, Swin Transformer, and Vision Transformer). Its precision (99.88% and 90.77%), recall (99.85% and 90.79%), and F1-score (99.81% and 90.88%) were optimal, showcasing its significant advantage over similarly sized models. FSF-DCT improved accuracy by over 3.28 percentage points compared to the original MobileNetV3. Compared to Swin Transformer and Vision Transformer, FSF-DCT improved identification accuracy by at least 4 percentage points with fewer FLOPs and parameters. Results from the open-source datasets confirmed FSF-DCT's strong generalizability in identifying multiple crop diseases with complex feature distributions. AFSF-DCT can be expected to reduce image noise and accurately and quickly identify crop leaf disease in complex backgrounds.