Xie Zhouyi, Feng Yazhi, Hu Yanrong, Liu Hongjiu. Generating image description of rice pests and diseases using a ResNet18 feature encoder[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(12): 197-206. DOI: 10.11975/j.issn.1002-6819.2022.12.023
    Citation: Xie Zhouyi, Feng Yazhi, Hu Yanrong, Liu Hongjiu. Generating image description of rice pests and diseases using a ResNet18 feature encoder[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(12): 197-206. DOI: 10.11975/j.issn.1002-6819.2022.12.023

    Generating image description of rice pests and diseases using a ResNet18 feature encoder

    • Abstract: Pests and diseases have posed a serious threat to the agricultural production and crops yields. Image description of agricultural pests and diseases can greatly contribute to the intelligent monitoring and diagnosis of crop health. However, the current models of target detection cannot generate the descriptions that related to the content of the image, although the class and location can be identified in recent years. A large number of parameters in the models can also be a great challenge on the edge computing platforms under the practical working scenarios. In this study, an image description model was designed using an encoder-decoder structure, in order to bridge the gap between visual features and text semantics. Firstly, 10 common images of rice pests and diseases were collected by a web crawler, which was acquired the training image data in a short time. Secondly, the original data was expanded to produce more samples than before using the luminance adjustment, horizontal or vertical flipping, and Gaussian noise. After data enhancement, each image was manually tagged with the five English sentences (including the descriptions of pests and diseases characteristics), and then stored in the JSON format. As such, the training, validation, and test data were into divided 1 793, 222, and 223 images, respectively. Finally, the encoder-decoder structure was successfully introduced using the local perception and parameter sharing of Convolutional Neural Networks (CNN). The shallow network of ResNet18 was designed as the encoder to automatically extract the image features, whereas, the decoder was used a Long Short Term Memory network (LSTM) when incorporating an attention mechanism to generate the image descriptions. The LSTM performed better on the task of time-series, compared with the CNN, in order to deal with the long-term dependency of Recurrent Neural Network (RNN), and then alleviate the gradient disappearance and explosion during the long-sequence training. The datasets were then trained on the traditional model of CNN-LSTM, the attention model of Att_CNN-LSTM, and AdtAtt_CNN-LSTM with the introduction of visual sentinels. The experimental results showed that the Att_CNN-LSTM model using the ResNet18 feature encoder performed the best under the same metrics, where the Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (GOUGE-L), and the Metric for Evaluation of Translation with Explicit ORdering (METEOR) reached 0.752, 0.657 and 0.404, respectively. The size of model was compressed by nearly 3 times than before, without loss in the capability of feature extraction. The model was basically converged after 6000 iterations, and significantly faster than that without the imposed attention mechanism, with the Top5 accuracy of 98.48% and loss of 0.813. Anyway, the evaluation metrics of the models were all improved significantly. The most outstanding model was the Consensus-based Image Description Evaluation (CIDEr), which reached the value of 1.623, and nearly 3 times better on the rice pests and diseases datasets, compared with the CNN-LSTM model. Overall, the image description generated by the Att_CNN-LSTM model can be expected to describe the image information in detail. By contrast, the CNN-LSTM without the attention model can only be used to describe the color features of the diseases. The improved model can be used to more accurately diagnose the diseases category and the supplementary features, such as the location of diseases. 223 test datasets together with 268 actual images were verified simultaneously to visualize the partial detection. Consequently, the improved model can be used to discriminate and describe the correlation between the diseases, fully meeting the training on the small-scale datasets with the high accuracy. The finding can also provide a strong reference to automatically describe the similar pests and diseases of crops.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return