基于改进YOLOv7模型的马铃薯种薯芽眼检测

张万枝; 张弘毅; 刘树峰; 曾祥; 穆桂脂; 张婷婷

doi:10.11975/j.issn.1002-6819.202307031

摘要: 芽眼精准检测是实现马铃薯种薯智能化切块的前提，但由于种薯芽眼区域所占面积小、可提取特征少以及种薯表面背景复杂等问题极易导致芽眼检测精度不高。为实现种薯芽眼精准检测，该研究提出一种基于改进YOLOv7的马铃薯种薯芽眼检测模型。首先在Backbone部分增加Contextual Transformer自注意力机制，通过赋予芽眼区域与背景区域不同权值大小，提升网络对芽眼的关注度并剔除冗余的背景信息；其次在Head部分利用InceptionNeXt模块替换原ELAN-H模块，减少因网络深度增加而造成芽眼高维特征信息的丢失，更好地进行多尺度融合提升芽眼的检测效果；最后更改边界框损失函数为NWD，降低损失值，加快网络模型的收敛速度。经试验，改进后的YOLOv7网络模型平均准确率均值达到95.40%，较原始模型提高4.2个百分点。与同类目标检测模型Faster-RCNN(ResNet50)、Faster-RCNN (VGG)、SSD、YOLOv3、YOLOv4、YOLOv5n、YOLOX相比，其检测精度分别高出34.09、26.32、27.25、22.88、35.92、17.23和15.70个百分点。在马铃薯种薯自动切块试验台上进行芽眼检测试验，对于表面光洁及表面附有泥土、破损的马铃薯种薯，改进后模型的漏检率分别为4%、11%，检测效果优于其他网络模型。研究结果可为后续马铃薯种薯智能化切块芽眼检测提供技术支持。

Abstract: Potatos are one of the most important food crops to maintain food security and stability in the world. However, the manual cutting of seed preparation has seriously restricted the development of the potato planting industry, due to the labor-intensive and high labor costs. It is in high demand to rapidly and accurately detect the seed buds for the intelligent cutting of potato seeds. The performance of detection can also directly dominate the quality of potato seed after cutting, which in turn determines the total yield and economic benefits. In this study, a target detection model was proposed for the potato seed buds using improved YOLOv7. The large-scale preparation was then realized to detect the small area occupied by the seed buds and a few extractable features under the complex background of the seed surface in the potato seed-cutting machinery. Firstly, a Contextual Transformer of self-attention mechanism was added into the Backbone part, in order to enhance the target objects and the removal of redundant background. Different weights were also given to the target objects and background region. Secondly, the InceptionNeXt module was selected to replace the original ELAN-H module in the Head part. The loss of high dimensional features was reduced, particularly for the potato seed buds with the increase of network depth. Multi-scale fusion was better performed to improve the detection of potato seed buds. Finally, the bounding box loss function was changed to normalized wasserstein distance(NWD), in order to reduce the loss value and speed up the convergence of the network model. The samples were selected as well-stored surfaces free of insect pests, dry rot and disease spots. The reason was that the quality of training samples depended mainly on the detection of the potato seed buds. Meanwhile, the additional samples with surface damage and soil were utilized for the diversity of the dataset. The total number of 500 samples was obtained in this case. The front and back sides of potato seed were captured at a distance of 30 cm from the samples using a CCD camera. A total of 1 000 JPEG format images were randomly divided into the training set (800 images), validation set (100 images), and test set (100 images), according to the ratio of 8:1:1. At the same time, data augmentation was performed on the dataset, including the mirror, rotate, cropping, brightness adjustment, and the noise addition into the images. Model generalization was then improved beyond the insufficient number of samples. The experimental results indicated that the mean average precision (mAP) of the improved YOLOv7 model reached 95.40%, which was 4.2 percentage points higher than the original. The detection accuracies of improved model were 34.09, 26.32, 27.25, 22.88, 35.92, 17.23, 15.70 percentage points higher than those of similar target detection models, such as Faster-RCNN(ResNet50), Faster-RCNN(VGG), SSD, YOLOv3, YOLOv4, YOLOv5, YOLOX, respectively. Therefore, the missed detection rates of the improved model were 4% and 11% for the potato seed with smooth and soil or damaged surface, respectively, indicating better detection than the rest in the actual application. The finding can provide strong support to the recognition of potato seed buds for the intelligent cutting of potato seed.

基于改进YOLOv7模型的马铃薯种薯芽眼检测

Detection of potato seed buds based on an improved YOLOv7 model