基于改进YOLOv8n的茶叶嫩稍检测方法

杨大勇; 黄正栎; 郑昌贤; 陈宏涛; 江新凤

doi:10.11975/j.issn.1002-6819.202401155

摘要: 针对名优茶智能采摘中茶叶嫩梢识别精度不足的问题，该研究对YOLOv8n模型进行优化。首先，在主干网络中引入动态蛇形卷积（dynamic snake convolution，DSConv），增强模型对茶叶嫩梢形状信息的捕捉能力；其次，将颈部的路径聚合网络（path aggregation network，PANet）替换为加权双向特征金字塔网络（bi-directional feature pyramid network，BiFPN），强化模型的特征融合效能；最后，在颈部网络的每个C2F模块后增设了无参注意力模块（simple attention module，SimAM），提升模型对茶叶嫩梢的识别关注度。试验结果表明，改进后的模型比原始模型的精确率（precision，P）、召回率（recall，R）、平均精确率均值（mean average precision，mAP）、F1得分（F1 score，F1）分别提升了4.2、2.9、3.7和3.3个百分点，推理速度为42 帧/s，模型大小为6.7 MB，满足低算力移动设备的部署条件。与Faster-RCNN、YOLOv5n、YOLOv7n和YOLOv8n目标检测算法相比，该研究提出的改进模型精确率分别高出57.4、4.4、4.7和4.2个百分点，召回率分别高出53.0、3.6、2.8和2.9个百分点，平均精确率均值分别高出58.9、5.0、4.6和3.7个百分点，F1得分分别高出了56.8、3.9、3.7和3.3个百分点，在茶叶嫩梢检测任务中展现出了更高的精确度和更低的漏检率，能够为名优茶的智能采摘提供算法参考。

Abstract: Premium teas are among the most important types of tea in the market, renowned for their high quality and reputation, and consisting of tender tea shoots. In recent years, manual picking has been unable to meet the large-scale production of premium teas due to the high cost and inefficient labor. On the other hand, traditional mechanical equipment tends to damage the tea quality easily. It is very necessary to develop intelligent and large-scale picking, in order to reduce labor costs. Deep learning models represented by YOLOv8n are effective tools for achieving accurate identification of tea shoots. However, YOLOv8n can still face some challenges, in terms of detection accuracy and the imbalance between detection accuracy and speed in actual tea shoot detection tasks. This study aims to detect the tea shoots using the improved YOLOv8n. The specific procedures were as follows. Firstly, one-bud and one-bud-one-leaf tea shoots were selected as the detection targets according to the picking standards for premium teas. Secondly, images of tea shoots were collected from various pitch angles and horizontal orientations and then underwent image processing. Data augmentation was applied, such as random cropping, linear transformation, and brightness adjustment. The dataset of tea shoots was obtained with 2215 images and 3227 detection targets. Finally, the YOLOv8n model was improved using the following modifications: 1) The dynamic snake convolution (DSConv) was introduced to the backbone network, in order to capture some morphological characteristics of tea shoots; 2) The path aggregation network (PANet) was replaced in the neck network with a weighted bi-directional feature pyramid network (BiFPN) to strengthen the feature fusion efficiency; 3) The simple attention module (SimAM) was added after each C2F module in the neck network, in order to improve the focus on recognizing tea shoots. An ablation experiment was designed to verify the effectiveness of the improved model, and a comparative experiment was conducted to demonstrate the advantages of the modified model. The results indicate that all three modifications were enhanced the detection accuracy of the model. Specifically, compared with the original model, the addition of the SimAM module increased the recall rate (R) by 2.4 percentage points, indicating effective attention to small tea shoots. The addition of the DSConv increased the precision (P) by 1.1 percentage points, indicating that it is effective in capturing the morphological characteristics of tea shoots. The introduction of the BiFPN enhanced the precision and mean average precision (mAP) by 1.6 and 2.1 percentage points respectively, with the recall rate by 1.7 percentage points, indicating a more efficient feature fusion than the PANet. The final improved model outperforms the original model by 4.2, 2.9, 3.7, and 3.3 percentage points in accuracy, recall, mean average precision, and F1 score respectively. The detection speed reaches 42 frames per second, achieving a balance between detection accuracy and speed. Meanwhile, the model size is 6.7 MB, which fully meets the deployment requirements for low-power mobile devices. Compared with other object detection models, such as Faster-RCNN, YOLOv5n, YOLOv7n, and YOLOv8n, the improved YOLOv8n model exhibited higher precision, recall, mean average precision, and F1 score, outperforming them by 57.4, 4.4, 4.7, 4.2 percentage points in precision respectively; 53.0, 3.6, 2.8, 2.9 percentage points in recall respectively; 58.9, 5.0, 4.6, 3.7 percentage points in mean average precision respectively; and 56.8, 3.9, 3.7, 3.3 percentage points in F1 score respectively. The higher accuracy and lower missed detection of the improved YOLOv8n model further confirmed its superiority in tea shoot detection tasks. Meanwhile, its moderate detection speed and model size were also suitable for application in low-power mobile devices during intelligent tea picking. The improved YOLOv8n model can lay the foundation for the intelligent picking of premium teas.

基于改进YOLOv8n的茶叶嫩稍检测方法

Detecting tea shoots using improved YOLOv8n