基于多尺度数据集的虫害检测模型

陈冬梅; 林佳; 王海亮; 吴开华; 陆玉恒; 周晓飞; 张竞成

doi:10.11975/j.issn.1002-6819.202311113

摘要: 在复杂和密集尺度的农业检测任务中基于Transformer的DETR类模型逐渐崭露头角，但现有的传统量化分析难以深入探究检测模型的优化策略和机制的有效性。为了满足密集小型昆虫的检测和计数的需求，该研究构建了真实场景采集的包含稻飞虱、蚜虫和麦蜘蛛3类小型害虫的多尺度虫害数据集（multi-scale tiny crowded planthopper-aphid-wheat mite pest detection，MTC-PAWPD），提出基于黏连率和相对尺寸的数据划分方式，将虫害数据集分为密集分布、大尺度、中小尺度和超小尺度4类场景。通过分析DINO模型在特定场景下的性能表现，设计详尽的消融试验方案，以特征与其关键组件查询的可视化试验辅助，验证虫害检测任务性能提高与各个模块之间尤其是与查询相关模块的联系。经试验，模型在MTC-PAWPD测试集上的mAP@50达70.0%，与其他主流模型相比提升2.3个百分点。在4类场景中mAP@50分别达42.5%、79.4%、75.7%和62.4%，在多尺度、密集任务中表现出较强的检测性能。通过针对自身模块的消融和可视化试验证明模型性能提升与Transformer部分的优化模块相关性更大。试验说明基于DINO的虫害检测模型在田间复杂背景下多尺度的复杂虫害检测任务中具有强大的泛化能力和实用价值。

Abstract: Pests have posed the severe threat to the crop yield in agricultural production. The frequent outbreaks of crop pest have limited the development of agriculture in China. It is very necessary to develop the artificial intelligence (AI) monitoring in modern pest control. Among them, transformer-based DETR models have demonstrated the strong potential to the multi-scale tasks of pest detection with intensive distribution that captured from nature scene. But the current quantitative analysis cannot verify the core mechanism of detection models. In this study, the DINO-based model was proposed to detect and count the dense tiny insects. Four modules were contained in the model. Multi-scale deformable attention module (MDAM) was used for the multi-scale feature mapping; Contrastive denoising training strategy (CDTS) was used to alleviate the duplicate detection of single pest object; Mixed query selection and look forward twice (MQS-LFT) was used to obtain the decoder's Query with more efficient initialization and iteration. A total of 4 019 images were collected from the real scene, including planthoppers, aphids and wheat mites. As such, the pest dataset was obtained and then named MTC-PAWPD (multi-scale tiny crowded planthopper-aphid-wheat mite pest detection). The dataset was also divided into the training, validation and test set, according to the ratio of 5:2:3. According to the overlap and relative scale, the dataset was then divided into four scenes using data partitioning, namely, intensive-distribution, large-scale, normal-scale, and tiny-scale scene. A comparison was also made on the performance of model in different scenes. The ablation and validation experiments were carried out to visualize the features and Query Anchor, in order to validate the correlation between the performance and query-related modules. The experimental results demonstrated that the DINO-based model was performed better on the recognition of pest objects. In MTC-PAWPD benchmark, the highest mean average precision was achieved at 50% intersection over union (mAP@50) of 70.0%, indicating an improvement of 2.3 percentage points mAP@50, compared with the mainstream object detection models, such as Faster R-CNN, YOLOv5x, ATSS, YOLOX, and Deformable DETR. In addition, the convergence speed was only 1/10 with respect to Yolo-based model. The mAP@50 reached 42.5%, 79.4%, 75.7% and 62.4% in the four scenes, respectively. All of them showed the powerful detection performance, particularly in the scale-related scenes. All the four improvement modules of DINO were enhanced the performance of pest target detection in real scenes. The performance bias was correlated to the module of Transformer. The performance was substantially improved by 0.2, 1.0, 0.9, and 0.8 percentage point in mAP@50, respectively. Specifically, the mixed query selection strategy was utilized the features in the encoder for query generation. The query point was allowed in the first layer of the decoder closer to the pest objects in the image. Thus, more features were identified to enhance the expressive power of the model. Look forward twice enabled the Query Point in the later layers of the decoder closer to the objects. Contrastive Denoising Training Strategy was introduced into the high-quality negative samples, particularly for the overlapped prediction of a single pest object. In summary, the dense query and components of DINO can be expected to facilitate the pest detection and counting tasks in multiple scale, complex, and densely distributed natural scenarios. The DINO model can fully meet the requirements for the feature extraction from the multi-scale nature pest images, thus accurately detecting the objects. The transformer model can also provide the strong generalization and practical value in the detection of multi-scale pest under the complex background of the field.

基于多尺度数据集的虫害检测模型

Pest detection model based on multi-scale dataset