基于改进YOLOv5s模型的茶叶嫩芽识别方法

王梦妮; 顾寄南; 王化佳; 胡甜甜; 方新领; 潘知瑶

doi:10.11975/j.issn.1002-6819.202303099

基于改进YOLOv5s模型的茶叶嫩芽识别方法

Method for identifying tea buds based on improved YOLOv5s model

摘要

摘要: 现有的目标检测算法检测茶叶嫩芽的精度较低，为提高茶叶嫩芽的检测精度，该研究提出一种基于改进YOLOv5s网络模型的茶叶嫩芽检测算法。该算法将骨干特征提取网络中的空间金字塔池化结构（spatial pyramid pooling-fast，SPPF）替换为空洞空间卷积池化金字塔结构（atrous spatial pyramid pooling，ASPP），增强模型对不同分辨率下目标的识别能力；针对茶叶嫩芽的小目标特征，在颈部网络中引入可加权重的双向特征金字塔网络（bidirectional feature pyramid network，BiFPN），提高特征融合的效率，同时在颈部网络中的每个集中综合卷积模块（concentrated-comprehensive convolution block，C3）后添加卷积注意力模块（convolutional block attention module，CBAM）来提高模型关注小目标特征的能力。试验结果表明，改进后获得的Tea-YOLOv5s比原模型的准确率（precision，P）、召回率（recall，R）和平均精度值（mean average precision，mAP）分别高出4.4、0.5和4.0个百分点，且模型鲁棒性强，在多个场景下茶叶嫩芽的检测中具有更高的置信度分数。改进后的模型可为茶叶的产量估计和茶叶采摘机器人的嫩芽识别奠定基础。

Abstract: Tea beverage is prepared to steep the leaf buds of the tea plant in freshly boiled water. Among them, the picking of tea buds is one of the most important links in the production process of tea. Manual picking cannot fully meet large-scale production at present, due to the labor intensity and time-consuming. Conventional machine picking can also cause some damage to the tea body. Therefore, it is a high demand to improve the efficiency of tea-picking using intelligent tea-picking robots. Among them, the YOLOv5s deep learning network model can be represented with the wide application of machine vision in various fields. The high detection accuracy and reasoning speed can be applied to crop recognition. However, the original network model cannot be used to recognize small targets, such as tea buds. The missed detection of tea buds can be caused by the feature extraction of small targets under the relatively complex environment of the tea garden. Therefore, it is necessary to improve the detection accuracy of tea buds under the detection speed. In this study, a tea bud detection was proposed using an improved YOLOv5s network model (Tea-YOLOv5s). The specific procedures were as follows. Firstly, 1190 images of tea buds were collected in the tea garden, and the data set was obtained. The SPPF (spatial pyramid pooling fast) spatial pyramid pooling structure in the original network model's backbone feature extraction network was replaced with the ASPP (atlas spatial pyramid pooling) spatial convolution pooling pyramid structure in the hole space. The loss was reduced to extract the local information and enhance the recognition ability of the model for the targets at different resolutions. Secondly, the weighted BiFPN (bidirectional feature pyramid network) bidirectional feature pyramid network was introduced into the neck network, and multiple basic structures were superimposed to enhance the feature extraction. The weight value after training was used to learn the contribution weight parameters of different input features, in order to improve the efficiency of feature fusion. Finally, a CBAM (convolutional block attention module) convolution attention module was added after each C3 (concentrated-comprehensive convolutional block) module in the neck network. The meaning and location of important features were focused on using the channel and spatial attention modules, respectively. The ability of the model was improved to focus on the small target features, such as tea buds. The test results show that the accuracy (P), recall (R), and mean average precision (mAP) of the improved Tea-YOLOv5s model were 85.0%, 75.5%, and 84.3%, respectively. The model was robust with higher confidence scores in the detection of tea buds in multiple scenarios. The improved model was tested to compare with many mainstream target detection, including Faster-RCNN, SSD, YOLOv3, YOLOv4, and YOLOv5s. The highest average accuracy of the Tea-YOLOv5s model was achieved, which was 54.27, 29.66, 26.40, 32.45, and 4.00 percentage points higher than the above mainstream target detection models, respectively. The reasoning speed fully met the recognition requirements. The improved model can lay the foundation to estimate the yield of tea, and then identify the tender buds of tea-picking robots. (Convolutional Block Attention Module) convolution attention module after each C3 (Concentrated-Comprehensive Convolutional Block) module in the neck network, and focus on the meaning of features and the location of important features through the channel attention module and the spatial attention module respectively, so as to improve the ability of the model to focus on small target features such as tea buds. The test results show that the precision (P), recall (R), and mean average precision (mAP) of the improved Tea-YOLOv5s model are 4.4, 0.5 , and 4.0 higher than the original model, respectively, and the model is robust and has higher confidence scores in the detection of tea buds in multiple scenarios. The improved model is tested and compared with many mainstream target detection algorithms Faster-RCNN, SSD, YOLOv3, YOLOv4, and YOLOv5s. The average accuracy of the Tea-YOLOv5s model is the highest, which is 54.27, 29.66, 26.40, 32.45, and 4.00 percentage points higher than the above mainstream target detection models, respectively. The reasoning speed meets the recognition requirements. The improved model can lay the foundation for estimating the yield of tea and identifying the tender buds of tea picking robots.

HTML全文

参考文献(31)

施引文献

资源附件(0)