Abstract
Deep learning has been widely applied to the recognition and segmentation of plant leaf disease. However, traditional recognition models of plant leaf disease usually lack transparency, since these end-to-end deep classifiers are susceptible to shadow, occlusion, and light intensity. To address this drawback, in this paper, we proposed a new recognition and segmentation model of plant leaf disease based on deconvolution-guided, called Deconvolution-Guided VGGNet (DGVGGNet). An encoder-decoder architecture with symmetric convolutional-deconvolutional layers was applied to DGVGGNet so that the plant leaf disease recognition and segmentation can be carried simultaneously. Our model consists of three main phases: recognition, inversion, and deconvolution. In the recognition phase, we first fed the VGGNet with plenty of plant leaf disease images, and then Categorical Cross-Entropy Loss was utilized to train the recognition model. VGGNet was made up of the convolution layers of VGG16 and 2 fully connected layers, and the weights of convolution layers were pre-trained on ImageNet. Ten kinds of tomato leaf disease images in the PlantVillage dataset were used in this paper, concretely, 30% of the pictures were used for training, and the rest 70% were used for the test. Besides, on-the-fly data augmentation was also exploited during the training stage, such as flipping the images and corrupting the origin images by brightness, saturation, salt noise, and Gaussian noise. In the inversion phase, two fully connected layers were used to reverse the category prediction vector. Simultaneously, two skip connections were used to reinforce the decoder by adding the vector of the VGGNet’s fully connected layers. Then, the feature vector was reshaped to feature maps to input into the deconvolution module. In the deconvolution phase, the feature maps were fed into the deconvolution module to acquire the segmentation result of the disease area, where each pixel was trained with the Binary Cross-Entropy Loss. The deconvolution module consists of the upsample and convolution operation. Meanwhile, five skip connections were used to fuse the multiple features, which can refine the segmentation results. Besides, only a few samples of the training set were given pixel-level labels of plant disease to supervise the output of the deconvolution module. At the end of the deconvolution module, the reconstruction layer was used to smooth the segmentation edge. To explore the influence of the number of pixel-level labels used in the model, 9 and 45 pixel-level labels were used to supervise the segmentation results, respectively. To simulate the natural conditions, different kinds of interference were added to the test data, such as translation, fruit occlusion, soil occlusion, leaf occlusion, and brightness reduction with different percentages. Experimentally, we evaluate our recognition module by exploring the performance of VGGNet, DGVGGNet-9, and DGVGGNet-45 on different interference datasets, respectively. We also evaluate our deconvolution module by exploring 4 different evaluation metrics, i.e., PA, MPA, MIoU, and FWIoU, compared with 3 popular semantic segmentation models, i.e., FCN-8s, U-Net, and SegNet.The results show that DGVGGNet-45 has the highest recognition accuracy as well as with the highest PA, MIoU, and FWIoU among the four segmentation evaluation metrics, which are 94.66%, 75.36%, and 90.46%, respectively. Compared with VGGNet, the deconvolution module of DGVGGNet-45 can guide the recognition module to pay more attention to the actual area of disease, which is effective in improving the segmentation accuracy. The recognition results demonstrate that DGVGGNet had strong robustness in tough conditions. Furthermore, DGVGGNet only took 12 ms to identify a single picture on the GPU, which can meet the real-time requirements.