Abstract:
Bark beetles(Dendroctonus spp), as wood-boring pests, pose a persistent threat to forest resource security due to their small size, cryptic behavior, and long damage cycles. According to monitoring data from the National Forestry and Grassland Administration, infestations of these pests were significant in 2024 in southwestern, northern, and northwestern China, affecting an area of 184,000 hectares, with moderate to severe damage accounting for 15.08% of the total. Different bark beetle species often exhibit sympatric distribution, yet their control strategies differ substantially. For instance, managing Dendroctonus micansrequires removal of infested trees combined with γ-hexachlorocyclohexane (Lindane) treatment; controlling Dendroctonus valensinvolves a combination of adult eradication and aluminum phosphide; whereas Heterobostrychus hamatipennisrelies on methyl bromide or aluminum phosphide fumigation. Moreover, interspecies similarity in macroscopic characteristics such as body length and coloration makes identification highly dependent on local microscopic features including the shape of the disc and elytral punctures, constituting a typical fine-grained recognition problem—that is, the need to accurately distinguish different species within the highly similar base category (Scolytidae) based on subtle discriminative features. Traditional manual morphological identification methods are inefficient and highly subjective; therefore, developing a rapid and accurate fine-grained recognition model has become crucial for the scientific control of bark beetles.To address these challenges, this study designed and implemented FGRS-Net (Fine-Grained Recognition for Scolytidae Network), a network architecture for fine-grained identification of bark beetles. Through multi-level technological innovations, the model systematically addresses key issues in bark beetle recognition, including scale variation, feature confusion, and computational efficiency. First, to mitigate inter-class recognition bias caused by insufficient training samples, a novel detection head module based on multi-modal embedding was proposed. By integrating morphological feature vectors, local texture descriptors, and spatial contextual information, this module constructs a joint embedding space that effectively enhances the discrimination ability for morphologically similar species and significantly reduces false detection rates induced by uneven sample distribution. Second, to address the large size range and variable habitat postures of bark beetles, an Attention Convolution Mixer (ACmix) module was introduced. Through the synergistic operation of parallel convolutional paths and self-attention mechanisms, this module achieves adaptive adjustment of multi-scale receptive fields, enabling the capture of local details of millimeter-scale pests (such as elytral punctures and antenna morphology) while effectively identifying overall distribution patterns in aggregated populations, thereby improving feature discrimination robustness in complex backgrounds. To further optimize feature representation efficiency, an Omni-Dimensional Dynamic Convolution (ODConv) module was integrated. By constructing a four-dimensional attention mechanism (across spatial, channel, kernel, and network depth dimensions), the module achieves dynamic generation and adaptive calibration of convolutional parameters, significantly reducing the number of parameters while enhancing the capture of key discriminative features such as wing venation structure and body segment proportions. For model lightweighting, a combined optimization strategy of structured pruning and knowledge distillation was adopted. Channel importance was constrained via L1 regularization to prune redundant feature connections, while a multi-teacher distillation framework was designed to transfer hierarchical feature representations from large networks to a lightweight student model. As a result, the model size was compressed by 40.7% and inference latency was reduced by 35% while maintaining accuracy.To comprehensively validate the model's applicability in practical scenarios, a multi-interference condition testing system was constructed, simulating complex field environments including lens fog, low illumination, blur, and foliage occlusion. Deployment verification was conducted on edge devices with different computational architectures. Experimental results show that FGRS-Net achieved a mean Average Precision (mAP) of 89.3% and a recall rate of 98% on the self-built fine-grained bark beetle dataset, with a 23% reduction in Floating Point Operations (FLOPs) and a detection speed of 289 FPS. In edge device deployment, the Raspberry Pi platform achieved real-time inference at 11 FPS, while the RK3576 platform reached a processing speed of 27 FPS. The technical solution proposed in this study provides reliable technical support for accurate monitoring of bark beetles in field environments and offers important references for the design of pest recognition models in the field of smart forestry.