Abstract:
Pinewood has been widely applied as the raw material in furniture, construction, and interior decoration, due to its lighter weight, pleasant aroma, and visually appealing texture. The high qualities of pine can greatly contribute to the unique aesthetic of the final products with durability and functionality. One of the pivotal steps is to detect the surface defects in the processing of pine wood. These defects can significantly dominate the appearance and structural integrity of finished goods, even the purchasing decisions and overall usage experience. This study aims to enhance the detection accuracy and efficiency of surface defects for the high quality of pine wood and its derivative products. A pioneering model was introduced to identify such imperfections. The performance and efficiency were then improved in the Dubbed RIC-DETR model using the original RT-DETR. The initial phase of images was acquired from a publicly accessible defects dataset of wood surfaces. A dataset of surface defects was assembled to contain 13642 labeled images after annotation and data augmentation. Seven types of defects were divided to simulate the potential variations in the lighting conditions, in order to ensure the data generalizability under different scenarios. Subsequently, systematic analysis was conducted with an eye toward finding an optimal balance between accuracy and computational complexity using various network structures, such as VGG, ResNet, and VanillaNet. Ultimately, ResNet18 was selected as the backbone network of feature extraction, due to its efficiency and effectiveness. The basic blocks within ResNet18 were then enhanced using the inverse residual mobile module. A strategic procedure was updated to expand the receptive field and the interaction between layers. The cascaded group attention mechanism was applied to reduce the computational resource consumption with the high expressive capabilities of the EfficientViT model. The RIC-DETR model was deployed with an average precision rate of 97.2%. A better performance was also achieved in total parameters of 15.2 M, floating-point operations of 46.8 G, and a memory footprint of 30.4 MB. Notably, exceptional proficiency was realized to detect seven types of defects with the highest recognition accuracy of 99.3%. Compared with the RT-DETR, the RIC-DETR model improved the average precision by 0.3 percentage points, whereas, the number of parameters, floating-point operations, and memory usage were reduced by 54%, 57%, and 52%, respectively. Moreover, the RIC-DETR improved the average precision by 2.1, 4.6, 1.4, and 0.8 percentage points, respectively, with the best frame rate of 63.5 frames per second, compared with four mainstream models from the YOLO series. Therefore, the RIC-DETR model can be expected to detect the surface defects of pine wood, in terms of detection efficacy, computational rate, and resource utilization. The neck encoding and decoding prediction can be optimized in the further RIC-DETR model. An adaptive scaling strategy can be recommended for the specific task and hardware in the neck encoding network for a better balance between speed and accuracy. The attention mechanisms can also be introduced to construct more effective feature decoding for high accuracy. This finding can provide a solid foundation to detect surface defects of pine wood in the modern processing industry.