Abstract
Nursery planting is ever expanding at present, particularly with the increasing demand for fruit and ornamental trees. Simple tools cannot fully meet the efficient work of large-scale production, due to the labor intensity and pesticide demand. Alternatively, spray robots can be expected to reduce the large number of tasks in modern agriculture. However, it is still lacking in the recognition accuracy and target identification of spray robots in nurseries. In this study, a nursery target detection was proposed using an improved YOLOv5s. Firstly, partial convolution (PConv) was introduced into the comprehensive convolution block (C3) to reduce the computational complexity, in order to improve the backbone network. A coordinate attention mechanism was added at the highest dimensional feature to enhance location awareness. Secondly, the neck structure of the network model was optimized to enhance the feature extraction of the model. At the same time, the bilinear interpolation was used for up-sampling operations during training. Finally, the original coupling detection head was replaced with the improved light decouple one, in order to further improve the detection accuracy. A nursery dataset was constructed with a total of 2 000 pictures, including three representative objects: trees, pedestrians, and cultivation pots. The images of the objects were collected from pedestrians with various postures at varying distances (far, medium, and near). The dataset included a total of 5 769 trees, 1688 pedestrians, and 6 178 pots. Only the trees and pots were marked in the first row during labeling, according to the operation requirements. Once the pedestrians are detected, they must be identified to consider the safety of pedestrians. Therefore, all pedestrians were labeled regardless of distance. The experimental results show that the detection of the model was improved differently via the C3 module and coordinate attention mechanism, as well as the optimized neck network structure and up-sampling mode using a lightweight decoupling head. Finally, the average mAP0.5, mAP0.5:0.95, accuracy, and recall reached 88.2%, 54.7%, 86.0%, and 82.4%, respectively. The size of the improved network model was 14.1 MB, the average detection speed of a single image was 19.5 ms, and the average frame rate was 51.3 frames. The mAP0.5, mAP0.5:0.95, accuracy, and recall increased by 4.6, 5.9, 1.8 and 3.4 percentage points, respectively, compared with the original YOLOv5s. The improved network had the highest accuracy, mAP0.5 and mAP0.5:0.95, compared with the current mainstream single-stage target detection YOLOv3-tiny, YOLOv7-tiny, and the latest YOLOv8s model. The experimental results show that the improved model performed better in accurately and rapidly identifying the pedestrians, pots, and trees in a complex environment. The research findings can also provide technical support to the operational activities of electric spraying robots in nurseries.