面向松木表面缺陷检测的改进RT-DETR模型

胡继文; 张国梁; 沈明哲; 李文浩

doi:10.11975/j.issn.1002-6819.202312183

摘要: 为提高松木表面缺陷检测精确度，保证检测速率，该研究提出一种改进RT-DETR的检测模型RIC-DETR。首先，从木材表面缺陷公开数据集中获取图片，并进行标注及数据增强，构建一个包含13642张图片的表面缺陷数据集；其次，对比VGG11、VGG13、ResNet18和VanillaNet13等网络架构，选用计算复杂度低且检测精度较高的ResNet18作为主干特征提取基准网络；然后，引入反向残差移动模块更新ResNet18中的基本块，扩展模型的感受野，改善层间的特征交互；最后，使用EfficientViT模型中的级联分组注意力机制对反向残差移动模块进行二次创新改进，降低计算资源的消耗，提升模型的表达能力。试验结果表明，RIC-DETR的精确率、召回率、平均精度均值分别为95.4%、96.0%、97.2%，均优于目前主流的YOLO系列模型，对比基准模型RT-DETR，RIC-DETR在保持高精度的情况下，参数量、浮点运算量和内存占用量大幅减少，分别降低了54%、57%、52%，同时检测速度可达63.5帧/s。RIC-DETR模型具有复杂度低、准确率高、检测速度快的特点，可为松木的表面缺陷检测提供技术支持。

Abstract: Pinewood has been widely applied as the raw material in furniture, construction, and interior decoration, due to its lighter weight, pleasant aroma, and visually appealing texture. The high qualities of pine can greatly contribute to the unique aesthetic of the final products with durability and functionality. One of the pivotal steps is to detect the surface defects in the processing of pine wood. These defects can significantly dominate the appearance and structural integrity of finished goods, even the purchasing decisions and overall usage experience. This study aims to enhance the detection accuracy and efficiency of surface defects for the high quality of pine wood and its derivative products. A pioneering model was introduced to identify such imperfections. The performance and efficiency were then improved in the Dubbed RIC-DETR model using the original RT-DETR. The initial phase of images was acquired from a publicly accessible defects dataset of wood surfaces. A dataset of surface defects was assembled to contain 13642 labeled images after annotation and data augmentation. Seven types of defects were divided to simulate the potential variations in the lighting conditions, in order to ensure the data generalizability under different scenarios. Subsequently, systematic analysis was conducted with an eye toward finding an optimal balance between accuracy and computational complexity using various network structures, such as VGG, ResNet, and VanillaNet. Ultimately, ResNet18 was selected as the backbone network of feature extraction, due to its efficiency and effectiveness. The basic blocks within ResNet18 were then enhanced using the inverse residual mobile module. A strategic procedure was updated to expand the receptive field and the interaction between layers. The cascaded group attention mechanism was applied to reduce the computational resource consumption with the high expressive capabilities of the EfficientViT model. The RIC-DETR model was deployed with an average precision rate of 97.2%. A better performance was also achieved in total parameters of 15.2 M, floating-point operations of 46.8 G, and a memory footprint of 30.4 MB. Notably, exceptional proficiency was realized to detect seven types of defects with the highest recognition accuracy of 99.3%. Compared with the RT-DETR, the RIC-DETR model improved the average precision by 0.3 percentage points, whereas, the number of parameters, floating-point operations, and memory usage were reduced by 54%, 57%, and 52%, respectively. Moreover, the RIC-DETR improved the average precision by 2.1, 4.6, 1.4, and 0.8 percentage points, respectively, with the best frame rate of 63.5 frames per second, compared with four mainstream models from the YOLO series. Therefore, the RIC-DETR model can be expected to detect the surface defects of pine wood, in terms of detection efficacy, computational rate, and resource utilization. The neck encoding and decoding prediction can be optimized in the further RIC-DETR model. An adaptive scaling strategy can be recommended for the specific task and hardware in the neck encoding network for a better balance between speed and accuracy. The attention mechanisms can also be introduced to construct more effective feature decoding for high accuracy. This finding can provide a solid foundation to detect surface defects of pine wood in the modern processing industry.

面向松木表面缺陷检测的改进RT-DETR模型

Detecting surface defects of pine wood using an improved RT-DETR model