结合多模态检测头的小蠹类害虫细粒度识别模型研究

    Research and implementation of a lightweight fine-grained bark beetle detection model with multimodal detection head

    • 摘要: 为解决小蠹类害虫(Dendroctonus spp)物种多样性高、近缘种形态相似且常同域分布导致的种类鉴定困难问题。该研究提出了能够细粒度识别小蠹虫种类的FGRS-Net(Fine-Grained Recognition for Scolytidae Network)网络架构。首先,为缓解样本不足导致的识别偏差,该研究自主设计了基于多模态嵌入的检测头模块;其次,为提取跨尺度鉴别特征,利用注意力机制混合模块ACmix(attention convolution mixer)实现了融合特征捕捉;为进一步获取特征并降低参数量,引入了全维度动态卷积模块ODConv(omni-dimensional dynamic convolution)重点关注昆虫细粒度特征;并通过剪枝以及知识蒸馏轻量化模型;为全面评估模型在实际应用中的可靠性,该研究在低照度、模糊及复杂背景遮挡等多种干扰条件下进行了系统的鲁棒性测试,并在不同计算架构的边缘设备上完成了部署验证。试验结果显示,FGRS-Net的平均精度均值达到89.3%,召回率为98%,浮点运算量降低23%,NVIDIA RTX 5090 GPU部署帧率达到289帧/s;双平台开发板部署帧率分别为11帧/s以及27帧/s。实践表明,FGRS-Net模型具有精确度高和轻量化的优点,相比于现有主流模型具有较好的竞争力,该研究结果可为后续细粒度小蠹虫识别提供参考。

       

      Abstract: Bark beetles(Dendroctonus spp), as wood-boring pests, pose a persistent threat to forest resource security due to their small size, cryptic behavior, and long damage cycles. According to monitoring data from the National Forestry and Grassland Administration, infestations of these pests were significant in 2024 in southwestern, northern, and northwestern China, affecting an area of 184,000 hectares, with moderate to severe damage accounting for 15.08% of the total. Different bark beetle species often exhibit sympatric distribution, yet their control strategies differ substantially. For instance, managing Dendroctonus micansrequires removal of infested trees combined with γ-hexachlorocyclohexane (Lindane) treatment; controlling Dendroctonus valensinvolves a combination of adult eradication and aluminum phosphide; whereas Heterobostrychus hamatipennisrelies on methyl bromide or aluminum phosphide fumigation. Moreover, interspecies similarity in macroscopic characteristics such as body length and coloration makes identification highly dependent on local microscopic features including the shape of the disc and elytral punctures, constituting a typical fine-grained recognition problem—that is, the need to accurately distinguish different species within the highly similar base category (Scolytidae) based on subtle discriminative features. Traditional manual morphological identification methods are inefficient and highly subjective; therefore, developing a rapid and accurate fine-grained recognition model has become crucial for the scientific control of bark beetles.To address these challenges, this study designed and implemented FGRS-Net (Fine-Grained Recognition for Scolytidae Network), a network architecture for fine-grained identification of bark beetles. Through multi-level technological innovations, the model systematically addresses key issues in bark beetle recognition, including scale variation, feature confusion, and computational efficiency. First, to mitigate inter-class recognition bias caused by insufficient training samples, a novel detection head module based on multi-modal embedding was proposed. By integrating morphological feature vectors, local texture descriptors, and spatial contextual information, this module constructs a joint embedding space that effectively enhances the discrimination ability for morphologically similar species and significantly reduces false detection rates induced by uneven sample distribution. Second, to address the large size range and variable habitat postures of bark beetles, an Attention Convolution Mixer (ACmix) module was introduced. Through the synergistic operation of parallel convolutional paths and self-attention mechanisms, this module achieves adaptive adjustment of multi-scale receptive fields, enabling the capture of local details of millimeter-scale pests (such as elytral punctures and antenna morphology) while effectively identifying overall distribution patterns in aggregated populations, thereby improving feature discrimination robustness in complex backgrounds. To further optimize feature representation efficiency, an Omni-Dimensional Dynamic Convolution (ODConv) module was integrated. By constructing a four-dimensional attention mechanism (across spatial, channel, kernel, and network depth dimensions), the module achieves dynamic generation and adaptive calibration of convolutional parameters, significantly reducing the number of parameters while enhancing the capture of key discriminative features such as wing venation structure and body segment proportions. For model lightweighting, a combined optimization strategy of structured pruning and knowledge distillation was adopted. Channel importance was constrained via L1 regularization to prune redundant feature connections, while a multi-teacher distillation framework was designed to transfer hierarchical feature representations from large networks to a lightweight student model. As a result, the model size was compressed by 40.7% and inference latency was reduced by 35% while maintaining accuracy.To comprehensively validate the model's applicability in practical scenarios, a multi-interference condition testing system was constructed, simulating complex field environments including lens fog, low illumination, blur, and foliage occlusion. Deployment verification was conducted on edge devices with different computational architectures. Experimental results show that FGRS-Net achieved a mean Average Precision (mAP) of 89.3% and a recall rate of 98% on the self-built fine-grained bark beetle dataset, with a 23% reduction in Floating Point Operations (FLOPs) and a detection speed of 289 FPS. In edge device deployment, the Raspberry Pi platform achieved real-time inference at 11 FPS, while the RK3576 platform reached a processing speed of 27 FPS. The technical solution proposed in this study provides reliable technical support for accurate monitoring of bark beetles in field environments and offers important references for the design of pest recognition models in the field of smart forestry.

       

    /

    返回文章
    返回