Abstract:
Pest recognition is a key foundation of pest management. Previous research exploited image classification to achieve automatic pest recognition. As it is difficult to obtain sufficient images for newly emerged pest classes, how to develop a pest classifier with a few labeled images is an interesting and challenging problem. Some existing works in the literature employed the matching network framework to solve this problem, which used meta-learning to avoid retraining deep networks. In these works, however, the feature extraction abilities of backbone networks were limited and the meta-learning algorithms did not provide a good weight initialization strategy or might cause network collapse. To close this gap, a few-shot pest classifier using a spatial-attention-enhanced version of ResNeSt-101 and a transfer-based meta-learning algorithm was proposed in this study. First, ResNeSt-101 was enhanced with a spatial attention block to better extract image features. The spatial attention block was suggested to integrate before the max pooling layer in the first stage of ResNeSt-101 and/or append at the end of stages 2-4, and the optimal location was determined as the first stage via the numerical simulation results. Subsequently, network weights were initialized by transfer learning and then optimized by meta-learning. To avoid network collapse, the normalized temperature-scaled cross-entropy loss function instead of the triplet loss function was chosen in the meta-learning algorithm. Finally, pest classification was achieved by computing similarities between deep features of query and support images. In addition, the proposed method was evaluated on two elaborately constructed pest image datasets AD0 and MIP50 with
N-way
K-shot accuracy and the time of per image processing (TPIP). These two pest image datasets were constructed as follows: images in the public pest image datasets, IP102 and D0, were firstly cleaned by eliminating the images with class ambiguities due to the English pest name-based categorizing; and the images of eggs, larvae, and pupae stages were removed while those of adults remained. Considering the limitation of human resources and time costs, only 50 classes were then selected from the cleaned IP102 pest dataset to construct the MIP50 pest image dataset. Subsequently, pest images were finally searched by the Latin pest names from the Internet, yielding the AD0 pest image dataset. The elaborately constructed MIP50 includes 16424 adult pest images from 50 categories of IP102, and the AD0 consists of 17112 adult pest images from all 40 categories of D0. Extensive experimental simulation results showed that when there were only a few unseen pest categories in the test set, the 5-way 10-shot accuracy evaluation method achieved an accuracy of 96.37% on the AD0 dataset and 76.91% on the MIP50 dataset. When there existed several unseen and seen pest classes in the test set, the proposed method using the 5-way 10-shot setting achieved an accuracy of 93.73% on the AD0 dataset and 90.60% on the MIP50 dataset. The TPIP of the proposed method was approximately 0.44 ms, which satisfies the real-time pest recognition requirement in most scenarios. In addition, a series of comparative and ablation experiments suggested that the proposed method was effective in few-shot pest classification. These results indicated that the proposed few-shot pest classification using spatial-attention-enhanced ResNeSt-101 network and transfer-based meta-learning is effective and thus promising in practical applications. Although the proposed scheme is promising, there exist several issues that need to be further investigated in future work. For example, increasing the way number would probably lead to lower classification accuracy, optimizing the metrics in this work by adopting the metric that better characterizes the complex relationships between samples in the support set and those in the query set, and applying the proposed scheme to practically recognize pests in fields.