基于空间注意力增强ResNeSt-101网络和迁移元学习的小样本害虫分类

梁炜健; 郭庆文; 王春桃; 肖德琴; 黄琼

doi:10.11975/j.issn.1002-6819.202304013

基于空间注意力增强ResNeSt-101网络和迁移元学习的小样本害虫分类

Few-shot pest classification using spatial-attention-enhanced ResNeSt-101 network and transfer-based meta-learning

摘要

摘要: 害虫识别是害虫防治的关键基础，由于较难获得足够的害虫种类图像，如何使用少量标记图像构造害虫分类器是一个富有挑战性的问题。现有研究多采用匹配网络框架来解决这个问题，该框架使用元学习避免重新训练深度网络，然而主干网络的特征提取能力有限，元学习算法没有提供较好的权重初始化策略，可能导致网络出现梯度消失或者梯度爆炸的情况。为了解决这一问题，该研究提出一种基于空间注意力增强ResNeSt-101和迁移元学习算法的小样本害虫分类器。首先，通过一个空间注意力模块增强ResNeSt-101以更好地提取害虫图像特征，即在ResNeSt-101的第1阶段的最大池化层之前以及在第2～4阶段的末尾分别附加集成空间注意力模块，并通过数值仿真确定空间注意力增强模块的最佳放置位置为第1阶段的最大池化层之前。随后，通过迁移学习策略初始化网络权重，进而通过元学习进行优化。为了避免网络出现梯度消失或者梯度爆炸的情况，在元学习算法中选择归一化的温度缩放交叉熵损失函数代替三元组损失函数。最后，通过计算查询图像和支持图像深度特征之间的相似度实现害虫分类。所提出方法在自建的害虫图像数据集AD0和MIP50上使用N-类K-例准确率和每张图像处理时间（the time of per image processing，TPIP）进行评估。害虫图像数据集的构建方式如下：首先对公共害虫图像数据集IP102和D0进行清洗，以消除由于英文害虫名称导致的歧义类别；然后移除卵、幼虫和蛹阶段的害虫图像，仅保留成虫阶段的图像。考虑到人工和时间成本，从清理后的IP102害虫数据集中选择50个类别构建MIP50害虫图像数据集。随后，通过害虫的拉丁名称从互联网搜索更多的害虫图像，生成AD0害虫图像数据集。自建的MIP50数据集包括来自IP102的50个类别的16424张成虫图像，AD0包含来自D0的所有40个类别的17112张成虫图像。试验结果表明，当测试集中只有少数未知类别的害虫图像时，本文方法在AD0数据集上的5-类10-例评估准确率达到了96.37％，在MIP50数据集上达到了76.91％。当测试集中同时存在几个未知和已知类别的害虫图像时，所提方法在AD0数据集上的5-类10-例设置下的识别准确率达到了93.73％，在MIP50数据集上达到90.60％。同时，本文方法的TPIP大约为0.44 ms，满足大多数场景下的实时害虫识别要求。此外，消融试验结果表明，基于空间注意力增强ResNeSt-101网络和迁移元学习的小样本害虫分类方法在AD0、MIP50数据集上对未知类别害虫图像的5-类10-例的识别准确率分别比提升了5和3个百分点以上，具有良好应用前景。但未来研究中还需进一步研究本方法中存在的问题，如通过采用更好地表征支持集样本与查询集样本之间复杂关系的度量优化本工作中用到的度量以解决增加类别数可能导致分类准确率降低的问题，以及将所提方法应用于现实农业场景进行优化改进以更好提升本文方法的实用性。

Abstract: Pest recognition is a key foundation of pest management. Previous research exploited image classification to achieve automatic pest recognition. As it is difficult to obtain sufficient images for newly emerged pest classes, how to develop a pest classifier with a few labeled images is an interesting and challenging problem. Some existing works in the literature employed the matching network framework to solve this problem, which used meta-learning to avoid retraining deep networks. In these works, however, the feature extraction abilities of backbone networks were limited and the meta-learning algorithms did not provide a good weight initialization strategy or might cause network collapse. To close this gap, a few-shot pest classifier using a spatial-attention-enhanced version of ResNeSt-101 and a transfer-based meta-learning algorithm was proposed in this study. First, ResNeSt-101 was enhanced with a spatial attention block to better extract image features. The spatial attention block was suggested to integrate before the max pooling layer in the first stage of ResNeSt-101 and/or append at the end of stages 2-4, and the optimal location was determined as the first stage via the numerical simulation results. Subsequently, network weights were initialized by transfer learning and then optimized by meta-learning. To avoid network collapse, the normalized temperature-scaled cross-entropy loss function instead of the triplet loss function was chosen in the meta-learning algorithm. Finally, pest classification was achieved by computing similarities between deep features of query and support images. In addition, the proposed method was evaluated on two elaborately constructed pest image datasets AD0 and MIP50 with N-way K-shot accuracy and the time of per image processing (TPIP). These two pest image datasets were constructed as follows: images in the public pest image datasets, IP102 and D0, were firstly cleaned by eliminating the images with class ambiguities due to the English pest name-based categorizing; and the images of eggs, larvae, and pupae stages were removed while those of adults remained. Considering the limitation of human resources and time costs, only 50 classes were then selected from the cleaned IP102 pest dataset to construct the MIP50 pest image dataset. Subsequently, pest images were finally searched by the Latin pest names from the Internet, yielding the AD0 pest image dataset. The elaborately constructed MIP50 includes 16424 adult pest images from 50 categories of IP102, and the AD0 consists of 17112 adult pest images from all 40 categories of D0. Extensive experimental simulation results showed that when there were only a few unseen pest categories in the test set, the 5-way 10-shot accuracy evaluation method achieved an accuracy of 96.37% on the AD0 dataset and 76.91% on the MIP50 dataset. When there existed several unseen and seen pest classes in the test set, the proposed method using the 5-way 10-shot setting achieved an accuracy of 93.73% on the AD0 dataset and 90.60% on the MIP50 dataset. The TPIP of the proposed method was approximately 0.44 ms, which satisfies the real-time pest recognition requirement in most scenarios. In addition, a series of comparative and ablation experiments suggested that the proposed method was effective in few-shot pest classification. These results indicated that the proposed few-shot pest classification using spatial-attention-enhanced ResNeSt-101 network and transfer-based meta-learning is effective and thus promising in practical applications. Although the proposed scheme is promising, there exist several issues that need to be further investigated in future work. For example, increasing the way number would probably lead to lower classification accuracy, optimizing the metrics in this work by adopting the metric that better characterizes the complex relationships between samples in the support set and those in the query set, and applying the proposed scheme to practically recognize pests in fields.

HTML全文

参考文献(27)

施引文献

资源附件(0)