Abstract:
Accuracy and efficiency of strawberry picking depend mainly on the recognition of strawberry fruit and segmentation of fruit stem in the elevated cultivation environment. In this study, an improved YOLOv5s recognition (ATCSP-YOLOv5s) was proposed for the elevated strawberry fruits under elevated cultivation environment, while a YOLOV5s-seg segmentation model was applied to detect and segment fruit stems. The detection accuracy of the recognition model was attributed to the improved backbone network of YOLOv5s. The self-attention mechanism was introduced to establish the self-attention cross-stage feature fusion network. This innovative structure was focused on the correlation of inherent features without the dependence on the external cues, thus improving the ability of the network to extract target features. The better performance was achieved to identify the small targets, blocked or overlapping strawberry fruits. The detection and segmentation of strawberry stem included the backbone network, feature fusion network, recognition head, and segmentation head of segmentation module. The mask information of strawberry and stem was captured at the same time. The interest region was determined to contain the target fruit stem. Then, the accurate segmentation of strawberry stem was effectively realized in the overhead cultivation using linear combination of recognition and segmentation. The experimental results showed that the precision (
P), recall (
R), and mean average precision (
mAP) of strawberry fruit recognition by ATCSP-YOLOv5s model were 97.24%, 94.07%, and 95.59%, respectively. The detection speed was 17.3 frames per second. The
P,
R and
mAP values of strawberry frontlight processing on sunny days by ATCSP-YOLOv5s were 97.89%, 95.16% and 95.71%, respectively, which were 3.06, 6.22 and 4.48 percentage points higher than those by YOLOv5s. The
P,
R and
mAP values of strawberry backlight processing on sunny days were 96.53%, 93.97% and 95.34%, respectively, which were 3.26, 7.43 and 4.6 percentage points higher than those of YOLOv5s. The
P,
R and
mAP values of strawberry image processing on cloudy days were 96.93%, 94.88% and 95.56%, respectively, which were 2.30, 7.01 and 4.50 percentage points higher than those of YOLOv5s. The performance of ATCSP-YOLOv5s recognition model was compared with that of Faster RCNN, YOLOv4, YOLOv5s, YOLOv6s, YOLOv7, and YOLOv8. The experimental results showed that the
P values of ATCSP-YOLOv5s recognition model was 6.73, 5.92, 4.96, 4.77, 3.35 and 3.43 percentage points higher than Faster RCNN, YOLOv4, YOLOv5s, YOLOv6s, YOLOv7 and YOLOv8, respectively. The
R values were higher by 11.68, 4.75, 7.13, 3.82, 3.76 and 3.01 percentage points, respectively. The
mAP values were 8.64, 7.03, 4.53, 4.27, 4.31, and 3.55 percentage points higher, indicating the better recognition performance. In addition, the
mAP of the ATCSP-YOLOv5s recognition model was above 90%, when detecting strawberry images under different lighting conditions, indicating the better robustness. The
P,
R and
mAP values of all images that processed by the YOLOv5s-seg segmentation model were 82.74%, 82.01%, and 80.67%, respectively. The
P values of strawberry images that processed by the YOLOv5s-seg segmentation model on sunny frontlight, sunny backlight, and cloudy days were 85.32%, 81.26%, and 81.89%, respectively. The
R values were 83.65%, 82.03% and 83.20%, respectively, while the
mAP values were 82.31%, 81.53% and 82.04%, respectively, and the segmentation accuracy was 98.29%, indicating the better segmentation accuracy and universality. The comprehensive experiment showed that the ATCSP-YOLOv5s recognition and YOLOv5s-seg segmentation model can be expected to rapidly and accurately identify the strawberry fruit, and then segment the target fruit stem. This finding can provide the theoretical and technical support for the automatic operation of strawberry picking robot, particularly on the target sensing and efficient picking of strawberry picking robots.