Safflower picking recognition in complex environments based on an improved YOLOv7
-
Graphical Abstract
-
Abstract
Abstract: Safflower silk is one of the most important cash crops in the medical treatment and dye. The manual picking of safflower silk cannot fully meet the large-scale production at present. The mechanized harvesting can be expected to improve the safflower harvesting efficiency and labor cost-saving in the industrial planting. The complex environmental factors have made great difficulty to accurately identify and locate the safflower during mechanical picking, including the natural environment (such as the light and shelter), and the safflower characteristics (such as the small and dense target, as well as the different maturity). In this study, an improved YOLOv7 model was proposed to rapidly and accurately locate the safflower recognition in the complex environment. 1500 safflower images were established to divide into three types of samples, silk, bulb, and decay. The small target was found in the datasets with uneven sample size, especially the data of decay. The safflower sample dataset was produced to build the complex environment data for the real picking. Firstly, the Swin Transformer attention mechanism was added to the YOLOv7 network model, in order to improve the detection accuracy of the model for each classification sample, and the ability of the backbone network, especially to extract the small target features. Secondly, the Focal Loss function of the multi-classification was improved in the recognition rate of unbalanced samples under multi-class tasks, particularly for the target confidence loss and classification loss. The attenuation parameters γ was adjusted to balance the sample. The attenuation parameters were determined after experimental verification. Finally, the safflower detection network model was designed to meet the real-time and accurate detection requirements under the complex environment. The test results show that the best performance of the model was achieved in the attenuation parameters of 1.0, where the position of Swin Transformer was the layer 50. The average precision of the improved model reached 88.5% in each category sample, which was 7 percentage points higher than before. The average detection accuracy of the unbalanced category sample decay was 15.9 percentage points higher. Compared with the Faster RCNN model, the detection speed increased by three times, while the average accuracy increased by 9.7 percentage points; Compared with the Deformable DETR model, the detection speed increased by about 5 times, and the average accuracy increased by 16.5 percentage points. The improved model performed the best, in terms of the detection efficiency, detection speed, and model size. The improved model can also be expected to accurately detect the safflower, indicating the smaller parameters and the faster recognition speed suitable for the migration deployment on the safflower picking machinery. The finding can provide the technical support for the mechanized picking in real time. In the future research, the balance between species can be promoted to expand the number of samples from the data level in the harvesting machinery.
-
-