Abstract:
Daylily is one of the most popular perennial herbaceous plants with the high nutritional and medicinal value. Manual picking cannot fully meet the large-scale planting in recent years, due to the high labor intensity and cost while low efficiency. Alternatively, automated picking can be expected in the daylily planting under agricultural mechanization. Among them, object detection and segmentation are the key technologies for the intelligent harvesting of daylily using machine vision. However, the original object detection has been prone to produce the missed and false detections in the unstructured environments, such as the unstable lighting conditions, complex and variable backgrounds, as well as mutual occlusion of targets. It is a high demand for the accuracy of the positioning during picking. In this study, an improved YOLO-Daylily model was proposed for the object detection and instance segmentation of daylily using YOLOv7-seg model. The CBAM (Convolutional Block Attention Module) attention mechanism module was also introduced into the YOLOv7-seg backbone network, in order to reduce the influence of background and interference factors; In the ELAN (effective layer aggregation networks) module, PConv (partial convolution) was used to replace the original 3 × 3 convolutional layers, thus reducing the redundant calculations and memory access. CoordConv was used to replace the 1×1 convolutional layer in the PA-FPN (path aggregation feature pyramid networks) of the neck network, in order to enhance the perception of position and mask robustness. The residual connection was used to combine the geometric information of shallow feature maps with the semantic information of deep feature maps, for the better detection and segmentation performance of the improved model. The ablation test showed that the detection accuracy, recall rate and average precision were 92%, 86.5% and 93%, respectively, which were 2.5, 2.3, and 2.7 percentage points higher than the baseline model. The segmentation accuracy, recall rate, and average precision were 92%, 86.7% and 93.5%, respectively, which increased by 0.2, 3.5, and 3 percentage points. A comparison was made to further verify the reliability of the improved YOLO-Daylily model. The performance of segmentation was also verified with the traditional two-stage segmentation, including Mask R-CNN and single-stage, such as SOLOv2, YOLOv5l-seg, and YOLOv5x-seg. The experimental results indicate that the YOLO-Daylily model was achieved in a significantly higher average accuracy of segmentation, particularly with a rise of 8.4 percentage points, compared with the two-stage instance segmentation (Mask R-CNN). The number of floating-point operations was reduced by 50% from 258.2 to 128.3. The frame per second (FPS) segmentation speed increased by about 1.8 times. Compared with SOLOv2, YOLOv5l-seg, and YOLOv5x-seg models, the average segmentation accuracy AP
50 raised 12.7, 4.8, and 5.4 percentage points, respectively; The GFLOPs parameters decreased by 40.9%, 12.4%, and 51.4%, respectively. The size of the improved model decreased as well. The better performance of detection, recognition, and segmentation was obtained to reduce the missed and false detections. The improved model can meet the requirements of real-time detection. The finding can provide the theoretical support for the practical application of intelligent harvesting of daylily.