Detection of the navigation line between lines in orchard using improved YOLOv7
-
Graphical Abstract
-
Abstract
Automatic navigation has improved the efficiency of farmland operation for the high crop yield. Visual navigation can be expected to realize the automatic navigation in the farmland operation machinery, because of its low hardware cost and wide application. However, the traditional algorithm of image processing cannot fully deal with the interference of light and shadow changes, occlusion, and weeds under the complex environment in an orchard. There is a high demand for better robustness and accuracy in the navigation path. In this study, an accurate and rapid detection of navigation lines was proposed for the orchard using the improved YOLOv7 deep learning model. Firstly, the attention mechanism module (convolutional block attention module, CBAM) containing the channel and spatial attention module was added to the detection head network of the original YOLOv7 structure. This attention mechanism enabled to more accurately capture the essential features of the target for better representation and generalization of the network. The efficient extraction and enhancement of the trunk key features were realized to weaken the environmental background interference. Secondly, the SPD-Conv (space-to-depth, non-strided convolution, SPD-Conv, i.e., a convolution-free step or pooling) was introduced between the ELAN-H and Repconv modules. As such, the low-resolution images or small-size targets were also detected to reduce the missed and false detection. The data set was collected from Beijing Xingshou Agricultural Professional Cooperative in Changping District, Beijing, China. 28 videos of orchards and a total of 1588 images were obtained under different lighting conditions. The targets were labeled using Labelimg software, with a total of 11043 apple trunks. The detection accuracy of the improved YOLOv7 model was 95.21% after training on the dataset, where the detection speed was 42.070 frames/s. The improved model increased by 2.31 percentage points, and 4.85 frames/s, respectively, compared with the original. Therefore, the improved model can be expected to more accurately identify the trunks of fruit trees, suitable for the apple orchard and jujube garden with the dense tree trunks. The ablation experiments were performed on each improvement point. The model accuracy of 93.97% was achieved after the introduction of the CBAM attention mechanism module, which was 1.07 percentage points over the original. The CBAM attention mechanism and SPD-Conv modules were introduced with a precision of 95.21%, which was a 1.24 percentage point improvement than before, indicating the better effectiveness of each improvement module. The trunk root accurate extraction of midpoint coordinates was crucial for the fitting of the navigation line, particularly with the trunk root midpoint as the navigation positioning base point. The coordinates of the improved YOLOv7 training for the fruit tree trunk were set at the bottom of the rectangular frame, instead of the trunk root midpoint as the locating reference point. Finally, the locating reference points were fitted on both sides of the fruit tree line and navigation line using the least squares method. The average line error of 4.43 pixels was achieved in the error analysis of 769 locating reference points extracted from the randomly selected 100 images from the data set and the manual marking of the trunk midpoints. The internal and external parameter matrixes of the camera were calculated to convert the pixel coordinates into the camera coordinates using Matlab software. The average actual error of 8.85 cm was obtained, indicating the reasonable and effective midpoint at the bottom of the rectangular frame as the locating reference point. The average deviation of the fitting and manual observation navigation line was 2.45 pixels in the 100 images, while the actual deviation was 4.90 cm, which fully met the accuracy requirements of navigation in the orchard. Three videos were randomly selected for the speed detection analysis of navigation lines. The total average time of processing one frame image was 0.044 s, which fully met the speed needs of navigation in the orchard.
-
-