Abstract
Tea beverage is prepared to steep the leaf buds of the tea plant in freshly boiled water. Among them, the picking of tea buds is one of the most important links in the production process of tea. Manual picking cannot fully meet large-scale production at present, due to the labor intensity and time-consuming. Conventional machine picking can also cause some damage to the tea body. Therefore, it is a high demand to improve the efficiency of tea-picking using intelligent tea-picking robots. Among them, the YOLOv5s deep learning network model can be represented with the wide application of machine vision in various fields. The high detection accuracy and reasoning speed can be applied to crop recognition. However, the original network model cannot be used to recognize small targets, such as tea buds. The missed detection of tea buds can be caused by the feature extraction of small targets under the relatively complex environment of the tea garden. Therefore, it is necessary to improve the detection accuracy of tea buds under the detection speed. In this study, a tea bud detection was proposed using an improved YOLOv5s network model (Tea-YOLOv5s). The specific procedures were as follows. Firstly, 1190 images of tea buds were collected in the tea garden, and the data set was obtained. The SPPF (spatial pyramid pooling fast) spatial pyramid pooling structure in the original network model's backbone feature extraction network was replaced with the ASPP (atlas spatial pyramid pooling) spatial convolution pooling pyramid structure in the hole space. The loss was reduced to extract the local information and enhance the recognition ability of the model for the targets at different resolutions. Secondly, the weighted BiFPN (bidirectional feature pyramid network) bidirectional feature pyramid network was introduced into the neck network, and multiple basic structures were superimposed to enhance the feature extraction. The weight value after training was used to learn the contribution weight parameters of different input features, in order to improve the efficiency of feature fusion. Finally, a CBAM (convolutional block attention module) convolution attention module was added after each C3 (concentrated-comprehensive convolutional block) module in the neck network. The meaning and location of important features were focused on using the channel and spatial attention modules, respectively. The ability of the model was improved to focus on the small target features, such as tea buds. The test results show that the accuracy (P), recall (R), and mean average precision (mAP) of the improved Tea-YOLOv5s model were 85.0%, 75.5%, and 84.3%, respectively. The model was robust with higher confidence scores in the detection of tea buds in multiple scenarios. The improved model was tested to compare with many mainstream target detection, including Faster-RCNN, SSD, YOLOv3, YOLOv4, and YOLOv5s. The highest average accuracy of the Tea-YOLOv5s model was achieved, which was 54.27, 29.66, 26.40, 32.45, and 4.00 percentage points higher than the above mainstream target detection models, respectively. The reasoning speed fully met the recognition requirements. The improved model can lay the foundation to estimate the yield of tea, and then identify the tender buds of tea-picking robots. (Convolutional Block Attention Module) convolution attention module after each C3 (Concentrated-Comprehensive Convolutional Block) module in the neck network, and focus on the meaning of features and the location of important features through the channel attention module and the spatial attention module respectively, so as to improve the ability of the model to focus on small target features such as tea buds. The test results show that the precision (P), recall (R), and mean average precision (mAP) of the improved Tea-YOLOv5s model are 4.4, 0.5 , and 4.0 higher than the original model, respectively, and the model is robust and has higher confidence scores in the detection of tea buds in multiple scenarios. The improved model is tested and compared with many mainstream target detection algorithms Faster-RCNN, SSD, YOLOv3, YOLOv4, and YOLOv5s. The average accuracy of the Tea-YOLOv5s model is the highest, which is 54.27, 29.66, 26.40, 32.45, and 4.00 percentage points higher than the above mainstream target detection models, respectively. The reasoning speed meets the recognition requirements. The improved model can lay the foundation for estimating the yield of tea and identifying the tender buds of tea picking robots.