Abstract:
Premium teas are among the most important types of tea in the market, renowned for their high quality and reputation, and consisting of tender tea shoots. In recent years, manual picking has been unable to meet the large-scale production of premium teas due to the high cost and inefficient labor. On the other hand, traditional mechanical equipment tends to damage the tea quality easily. It is very necessary to develop intelligent and large-scale picking, in order to reduce labor costs. Deep learning models represented by YOLOv8n are effective tools for achieving accurate identification of tea shoots. However, YOLOv8n can still face some challenges, in terms of detection accuracy and the imbalance between detection accuracy and speed in actual tea shoot detection tasks. This study aims to detect the tea shoots using the improved YOLOv8n. The specific procedures were as follows. Firstly, one-bud and one-bud-one-leaf tea shoots were selected as the detection targets according to the picking standards for premium teas. Secondly, images of tea shoots were collected from various pitch angles and horizontal orientations and then underwent image processing. Data augmentation was applied, such as random cropping, linear transformation, and brightness adjustment. The dataset of tea shoots was obtained with 2215 images and 3227 detection targets. Finally, the YOLOv8n model was improved using the following modifications: 1) The dynamic snake convolution (DSConv) was introduced to the backbone network, in order to capture some morphological characteristics of tea shoots; 2) The path aggregation network (PANet) was replaced in the neck network with a weighted bi-directional feature pyramid network (BiFPN) to strengthen the feature fusion efficiency; 3) The simple attention module (SimAM) was added after each C2F module in the neck network, in order to improve the focus on recognizing tea shoots. An ablation experiment was designed to verify the effectiveness of the improved model, and a comparative experiment was conducted to demonstrate the advantages of the modified model. The results indicate that all three modifications were enhanced the detection accuracy of the model. Specifically, compared with the original model, the addition of the SimAM module increased the recall rate (
R) by 2.4 percentage points, indicating effective attention to small tea shoots. The addition of the DSConv increased the precision (
P) by 1.1 percentage points, indicating that it is effective in capturing the morphological characteristics of tea shoots. The introduction of the BiFPN enhanced the precision and mean average precision (mAP) by 1.6 and 2.1 percentage points respectively, with the recall rate by 1.7 percentage points, indicating a more efficient feature fusion than the PANet. The final improved model outperforms the original model by 4.2, 2.9, 3.7, and 3.3 percentage points in accuracy, recall, mean average precision, and
F1 score respectively. The detection speed reaches 42 frames per second, achieving a balance between detection accuracy and speed. Meanwhile, the model size is 6.7 MB, which fully meets the deployment requirements for low-power mobile devices. Compared with other object detection models, such as Faster-RCNN, YOLOv5n, YOLOv7n, and YOLOv8n, the improved YOLOv8n model exhibited higher precision, recall, mean average precision, and
F1 score, outperforming them by 57.4, 4.4, 4.7, 4.2 percentage points in precision respectively; 53.0, 3.6, 2.8, 2.9 percentage points in recall respectively; 58.9, 5.0, 4.6, 3.7 percentage points in mean average precision respectively; and 56.8, 3.9, 3.7, 3.3 percentage points in
F1 score respectively. The higher accuracy and lower missed detection of the improved YOLOv8n model further confirmed its superiority in tea shoot detection tasks. Meanwhile, its moderate detection speed and model size were also suitable for application in low-power mobile devices during intelligent tea picking. The improved YOLOv8n model can lay the foundation for the intelligent picking of premium teas.