基于改进YOLOv7的复杂环境下红花采摘识别

王小荣; 许燕; 周建平; 陈金荣

doi:10.11975/j.issn.1002-6819.202211164

摘要: 针对光照、遮挡、密集以及样本数量不均衡等复杂环境造成红花机械化采摘识别不准问题，提出一种基于YOLOv7的改进模型，制作红花样本数据集建立真实采摘的复杂环境数据，增加Swin Transformer注意力机制提高模型对各分类样本的检测精准率，改进Focal Loss损失函数提升多分类任务下不均衡样本的识别率。经试验，改进后的模型各类别样本的检测平均准确率（mAP）达到88.5%，与改进前相比提高了7个百分点，不均衡类别样本平均精度（AP）提高了15.9个百分点，与其他模型相比，检测平均准确率与检测速度均大幅提升。结果表明改进后的模型可以准确地实现对红花的检测，模型参数量小识别速度快，适合在红花采摘机械上进行迁移部署，可为机械化实时采摘研究提供技术支持。

Abstract: Abstract: Safflower silk is one of the most important cash crops in the medical treatment and dye. The manual picking of safflower silk cannot fully meet the large-scale production at present. The mechanized harvesting can be expected to improve the safflower harvesting efficiency and labor cost-saving in the industrial planting. The complex environmental factors have made great difficulty to accurately identify and locate the safflower during mechanical picking, including the natural environment (such as the light and shelter), and the safflower characteristics (such as the small and dense target, as well as the different maturity). In this study, an improved YOLOv7 model was proposed to rapidly and accurately locate the safflower recognition in the complex environment. 1500 safflower images were established to divide into three types of samples, silk, bulb, and decay. The small target was found in the datasets with uneven sample size, especially the data of decay. The safflower sample dataset was produced to build the complex environment data for the real picking. Firstly, the Swin Transformer attention mechanism was added to the YOLOv7 network model, in order to improve the detection accuracy of the model for each classification sample, and the ability of the backbone network, especially to extract the small target features. Secondly, the Focal Loss function of the multi-classification was improved in the recognition rate of unbalanced samples under multi-class tasks, particularly for the target confidence loss and classification loss. The attenuation parameters γ was adjusted to balance the sample. The attenuation parameters were determined after experimental verification. Finally, the safflower detection network model was designed to meet the real-time and accurate detection requirements under the complex environment. The test results show that the best performance of the model was achieved in the attenuation parameters of 1.0, where the position of Swin Transformer was the layer 50. The average precision of the improved model reached 88.5% in each category sample, which was 7 percentage points higher than before. The average detection accuracy of the unbalanced category sample decay was 15.9 percentage points higher. Compared with the Faster RCNN model, the detection speed increased by three times, while the average accuracy increased by 9.7 percentage points; Compared with the Deformable DETR model, the detection speed increased by about 5 times, and the average accuracy increased by 16.5 percentage points. The improved model performed the best, in terms of the detection efficiency, detection speed, and model size. The improved model can also be expected to accurately detect the safflower, indicating the smaller parameters and the faster recognition speed suitable for the migration deployment on the safflower picking machinery. The finding can provide the technical support for the mechanized picking in real time. In the future research, the balance between species can be promoted to expand the number of samples from the data level in the harvesting machinery.

基于改进YOLOv7的复杂环境下红花采摘识别

Safflower picking recognition in complex environments based on an improved YOLOv7