基于改进YOLOv7的苹果生长状态及姿态识别

陈青; 殷程凯; 郭自良; 吴玄博; 王金鹏; 周宏平

doi:10.11975/j.issn.1002-6819.202311080

摘要: 针对目前苹果在复杂环境下难以进行生长状态分类识别、姿态信息同步获取等问题，该研究提出了一种基于改进YOLOv7的苹果生长状态分类和果实姿态融合识别方法。首先改进多尺度特征融合网络，在骨干网络中增加160×160的特征尺度层，用于增强模型对微小局部特征的识别敏感度；其次引入注意力机制CBAM（convolutional block attention module），改善网络对输入图片的感兴趣目标区域的关注度；最后采用Soft-NMS算法，能够有效避免高密度重叠目标被一次抑制从而发生漏检现象。此外，结合UNet分割网络和最小外接圆及矩形特征获取未遮挡苹果姿态。试验结果表明，改进YOLOv7的识别精确率、召回率和平均识别精度分别为86.9%、80.5%和87.1%，相比原始YOLOv7模型分别提高了4.2、2.2和3.7个百分点，与Faster RCNN、YOLOv5s、YOLOv5m相比，检测平均准确率分别提升了18.9、7.2和5.9个百分点，另外苹果姿态检测方法的准确率为94%。该文模型能够实现苹果生长状态分类及果实姿态识别，可为末端执行器提供了抓取方向，以期为苹果无损高效的采摘奠定基础。

Abstract: Manual picking cannot fully meet the large-scale production in China at present. Robotic picking has been an inevitable trend, particularly with the shortage of labor resources and the rapid development of mechanical automation. It is very necessary to accurately identify and position the apples in the complex environments. Fruit attitude fusion acquisition can be synchronously realized and then classified the apple information. Sometimes, only a small portion of target fruit is covered from the orchard environment, including the leaves, branches, and fruits. There are the small differences among the fruit growth patterns. The convolutional neural network is easy to cause the deep feature map, and then lose the key information of fruit covering parts after multiple convolution operations, resulting in the misrecognition of the fruit growth pattern. At the same time, the detection network can easily identify two apples as one for the overlapping fruits in the natural environment, thus causing the omission of the occluded fruits. In this study, an improved YOLOv7 model was proposed to recognize the apple posture from the growth morphologies. Firstly, the multi-scale feature fusion network was improved to add a 160×160 feature scale layer in the backbone network. The sensitivity of the model was enhanced to identify the tiny local features; Secondly, CBAM attention mechanism was introduced to improve the target region of interest; Finally, the Soft-NMS was used to effectively avoid the high-density overlapping targets being suppressed at one time, thus reducing the missed detection. The experimental results show that the recognition accuracy, recall and average recognition precision of DCS-YOLOv7 were 86.9%, 80.5% and 87.1%, respectively, which were 4.2%, 2.2% and 3.2% higher than the original YOLOv7 model. The average accuracy and speed were greatly improved to fully meet the requirements of picking robot. In addition, an apple gesture recognition was proposed using semantic segmentation and the minimum outer join features. Firstly, comparison tests showed that the Unet model exhibited the best performance in apple image segmentation. The average pixel accuracies were 0.7 and 0.2 percentage points higher than those of DeepLabv3+ and PSPNet. The average intersection and merger ratios were 1.7 and 1.1 percentage points higher as well. The average speed of segmentation also outperformed the rest. As such, the UNet instance segmentation network was chosen as the apple segmentation model. The apple image was segmented using UNet semantic segmentation network. The apple and calyx contour features were obtained by the contour extraction , and then the pose of unobstructed apple was obtained using the apple minimum external feature. The accuracy was 94% to detect the apple pose. The average processing time for each image was 15.7ms, indicating the better acquisition for the pose of apple target. The validity and correctness of recognition model were verified with the high detection accuracy to integrate the recognition of fruit growth pattern and posture. The recognition of fruit posture was considered to classify the growth pattern of apples. The end-effector can rapidly and accurately pick the fruits in a suitable way. The finding can lay the foundation for the non-destructive and efficient picking of apples.

基于改进YOLOv7的苹果生长状态及姿态识别

Apple growth status and posture recognition using improved YOLOv7