基于改进YOLOv5s的名优绿茶品质检测

尹川; 苏议辉; 潘勉; 段金松

doi:10.11975/j.issn.1002-6819.202212151

摘要: 针对实际检测过程中茶叶数量多、体积小、茶叶之间颜色和纹理相似等特点，该研究提出了一种基于YOLOv5s的名优绿茶品质检测算法。首先，该算法在骨干网络层引入膨胀卷积网络，通过增大感受野的方式增强茶叶微小特征的提取。其次，改进特征融合进程，基于通道注意力和空间注意力抑制无关信息的干扰，构建CBAM注意力机制优化检测器。接着根据swin transformer网络结构在多个维度对小尺度茶叶的特征进行交互和融合。最后，配合SimOTA匹配算法动态分配茶叶正样本，提高不同品质茶叶的识别能力。结果表明，改进后的算法精准度、召回率、平均精度均值、模型体积、检测速度分别为97.4%、89.7%、91.9%、7.11MB和51帧/s，相较于基础的YOLOv5s平均精度均值提高了3.8个百分点，检测速度提高了7帧/s。利用相同数据集在不同目标检测模型上进行对比试验，与Faster-RCNN、SSD、YOLOv3、YOLOv4等模型相比，平均精度均值分别提升10.8、22.9、18.6、8.4个百分点，进一步验证了该研究方法的有效性和可靠性。

Abstract: Abstract: The evaluation of tea quality can directly dominate the market value in the tea industry. Among them, the sensory evaluation is widely combined with the rational analysis to assess the quality of tea in recent years. However, some limitations are prone to the random evaluation with high error and strong repeatability. Physical and chemical approaches are also limited in the tea quality assessment, due to the costly, time-consuming, and destructive tasks. In this study, a computer vision-based approach was proposed to assess the appearance and sensory quality of Xinchang's renowned flat green tea. Various characteristics of tea were also considered during detection, such as the small scale, high density, and weak feature significance. A quality detection model was established for the tea shape sensory usingYOLOv5s deep learning and machine vision. A three-band diluted convolution (TDC) structure with the receptive fields was introduced to enhance the extraction of tea features in the backbone network. Additionally, the Convolution Block Attention Module (CBAM) was introduced to determine the attention area in the dense scenes using channel and spatial attention. The local perception of the network was also promoted to improve the detection accuracy of small-scale tea. Furthermore, the Swin Transformer network structure was also introduced to enhance the semantic information and feature representation of small targets with the help of window self-attention in the feature fusion stage. Finally, the positive sample matching was improved by the dynamically allocating positive samples using the SimOTA. An optimal box of sample matching was assigned to each positive tea sample for the high efficiency and the detection accuracy of the network. The ablation experiment was performed on the self-made tea dataset. The results show that the modified model was significantly improved the average accuracy of target detection on tea images. The improved YOLOv5 presented the higher confidence score in the tea quality detection than the conventional one. The higher accuracy was also achieved in the positioning. The detection accuracy increased by 3.8 percentage points in the applied dataset, indicating the greatly reduced false detection. Mean Average Precision (mAP) and Frame Per Second (FPS) reached 91.9% and 51 frames/s, respectively, indicating the multiclass average accuracy of YOLOv5. The FPS was also improved by 7 frames/s. The excellent real-time performance was achieved in the higher recognition accuracy and speed, compared with current mainstream target detections, indicating the feasibility and superiority of this model. These findings can provide a strong reference to improve the quality detection in tea market. In conclusion, the computer vision-based approach of the YOLOv5s can be expected to serve as a novel and effective way for the appearance and sensory quality of tea, with the better accuracy, speed, and efficiency in tea industry.

基于改进YOLOv5s的名优绿茶品质检测

Detection of the quality of famous green tea based on improved YOLOv5s