基于改进YOLOv8模型的轻量化板栗果实识别方法

李茂; 肖洋轶; 宗望远; 宋宝

doi:10.11975/j.issn.1002-6819.202309185

摘要: 为实现自然环境下的板栗果实目标快速识别，该研究以湖北省种植板栗为研究对象，提出了一种基于改进YOLOv8模型的栗果识别方法YOLOv8-PBi。首先，将部分卷积（partial convolution，PConv）引入C2f模块中，缩减卷积过程中的浮点数和计算量；其次，引入加权双向特征金字塔网络（bidirectional feature pyramid network，BiFPN），增强多尺度特征融合性能，最后，更改边界框损失函数为动态非单调聚焦机制WIoU（wise intersection over union，WIoU），提高模型收敛速度，进一步提升模型检测性能。试验结果表明，改进YOLOv8-PBi模型准确率、召回率和平均精度分别为89.4%、74.9%、84.2%；相比原始基础网络YOLOv8s，模型权重减小46.22%，准确率、召回率和平均精度分别提升1.3、1.5、1.8个百分点。部署模型至边缘嵌入式设备上，经过TensorRT加速后，检测帧率达到43 帧/s。该方法可为板栗智能化收获过程中的栗果识别提供技术基础。

Abstract: Chestnut industry has been confined to low productivity, high labor expenses, and operational hazards. Information technology can be expected to deal with these obstacles in the chestnut production. Fruit object detection can also offer essential technical assistance and solutions to the intelligent division of chestnut harvesting. In this study, a chestnut fruit object detection (YOLOv8-PBi) was proposed using the lightweight improved YOLOv8s. The research object was taken in the natural environment of Hubei Province, China. This approach brought various advancements, where the YOLOv8 was used as the basis network. Initially, the C2f-PConv module was constructed to minimize the floating-point number and computation amount using partial convolution (PConv), in order to improve the utilization of computational capacity. A weighted bidirectional feature pyramid network (BiFPN) was then employed to enhance the cross-scale connections and feature fusion performance. Thirdly, the convergence speed and detection performance were enhanced to change the bounding box loss function, in order to employ the dynamic non-monotone focusing technique Wise intersection over union (WIoU). Lastly, transfer learning was used to raise the accuracy and generalizability of the model. The enhanced YOLOv8-PBi model was scored by 89.4%, 74.9%, and 84.2% in the accuracy, recall, and average precision, respectively. The model weight was 46.22% less, compared with the original base network YOLOv8s. The gains were observed in the accuracy, recall, and average precision of 1.3, 1.5, and 1.8 percentage points, respectively. Additionally, the PC inference speed increased from 81.9 to 108 frames per second. The average detection accuracy increased by 0.8 percentage points after substituting the original bounding box loss function, CIoU, for WIoU. Both the gradient descent and fitting speeds increased significantly. BiFPN was replaced with the original feature fusion network, in order to increase the model recall rate by 0.9 percentage points. The model was allowed to concentrate more precisely on the attributes specific to chestnut fruit objects. The enhanced performance was achieved in the C2f-PConv approach, with an average accuracy of 3.5, 9.0, 7.5, and 9.9 percentage points greater than the standard lightweight feature extraction networks of MobileNetV3, MobileNetV2, GhostNet, and ShuffleNetV2. The enhanced WIoU loss function demonstrated that the WIoU had the lowest loss value and the fastest iterative convergence speed, compared with the CIoU, EIoU, and SIoU. The model size was the smallest, compared with the mainstream object identification models, such as SSD, Fast-RCNN, YOLOv5s, YOLOv5m, YOLOv7-tiny, and YOLOv8s. The average accuracy was found to be 21.33, 47.82, 4.4, 1.9, 6.2, and 1.8 percentage points greater than those of the other models, respectively. The model was finally deployed on the edge-embedded devices using TensorRT acceleration. The better performance was achieved in the detection frame rate of 43 frames per second and a detection speed of 23.26 ms, fully meeting the device deployment requirements. YOLOv8-PBi enhanced the misdetection and missed detection scenarios in the environments of backlight, overcast, and occlusion, compared with YOLOv8s. This finding can provide a technical foundation to identify the chestnut fruits in the process of intelligent chestnut harvesting.

基于改进YOLOv8模型的轻量化板栗果实识别方法

Detecting chestnuts using improved lightweight YOLOv8