Abstract:
Potatos are one of the most important food crops to maintain food security and stability in the world. However, the manual cutting of seed preparation has seriously restricted the development of the potato planting industry, due to the labor-intensive and high labor costs. It is in high demand to rapidly and accurately detect the seed buds for the intelligent cutting of potato seeds. The performance of detection can also directly dominate the quality of potato seed after cutting, which in turn determines the total yield and economic benefits. In this study, a target detection model was proposed for the potato seed buds using improved YOLOv7. The large-scale preparation was then realized to detect the small area occupied by the seed buds and a few extractable features under the complex background of the seed surface in the potato seed-cutting machinery. Firstly, a Contextual Transformer of self-attention mechanism was added into the Backbone part, in order to enhance the target objects and the removal of redundant background. Different weights were also given to the target objects and background region. Secondly, the InceptionNeXt module was selected to replace the original ELAN-H module in the Head part. The loss of high dimensional features was reduced, particularly for the potato seed buds with the increase of network depth. Multi-scale fusion was better performed to improve the detection of potato seed buds. Finally, the bounding box loss function was changed to normalized wasserstein distance(NWD), in order to reduce the loss value and speed up the convergence of the network model. The samples were selected as well-stored surfaces free of insect pests, dry rot and disease spots. The reason was that the quality of training samples depended mainly on the detection of the potato seed buds. Meanwhile, the additional samples with surface damage and soil were utilized for the diversity of the dataset. The total number of 500 samples was obtained in this case. The front and back sides of potato seed were captured at a distance of 30 cm from the samples using a CCD camera. A total of 1 000 JPEG format images were randomly divided into the training set (800 images), validation set (100 images), and test set (100 images), according to the ratio of 8:1:1. At the same time, data augmentation was performed on the dataset, including the mirror, rotate, cropping, brightness adjustment, and the noise addition into the images. Model generalization was then improved beyond the insufficient number of samples. The experimental results indicated that the mean average precision (mAP) of the improved YOLOv7 model reached 95.40%, which was 4.2 percentage points higher than the original. The detection accuracies of improved model were 34.09, 26.32, 27.25, 22.88, 35.92, 17.23, 15.70 percentage points higher than those of similar target detection models, such as Faster-RCNN(ResNet50), Faster-RCNN(VGG), SSD, YOLOv3, YOLOv4, YOLOv5, YOLOX, respectively. Therefore, the missed detection rates of the improved model were 4% and 11% for the potato seed with smooth and soil or damaged surface, respectively, indicating better detection than the rest in the actual application. The finding can provide strong support to the recognition of potato seed buds for the intelligent cutting of potato seed.