Abstract:
Manual planting and picking cannot fully meet the large-scale industry of green Sichuan pepper in recent years. It is crucial to accurately segment the field scene of green Sichuan pepper, and then extract the navigation path of agricultural machinery. Field management can be advanced in the intelligence of agricultural machinery in green Sichuan pepper fields. In this study, field images were collected from three planting demonstration bases of green Sichuan pepper in Jiangjin District, Chongqing Municipality, in the various planting stages. A total of 400 images were gathered to divide the dataset and test set, according to a 3:1 ratio. The open-source annotation tool Labelme was utilized to annotate the images. A navigation dataset was constructed between the rows of green Sichuan peppers, followed by data enhancement. Given the complex scenes of a green Sichuan pepper field, a lightweight network, Mobile-Unet, was proposed for the semantic segmentation of five scene types: road, trunk, tree, sky, and background. U-Net network was taken as the base semantic segmentation framework and then MobileNetV2 as the feature extraction network. The last three layers of the original MobileNetV2 network were omitted to adapt MobileNetV2 for semantic segmentation. 8-layer 5-times downsampling structure was then aligned with the U-Net architecture. Additionally, the LeakyReLU activation function was employed in the convolutional units to avoid neuron death during training. After segmentation, a navigation line extraction was then introduced to incorporate dual characteristics of roads and tree trunks. Experimental results demonstrate that the dataset and Dice Loss as the loss function effectively enhanced the prediction accuracy of the model. Compared with the two lightweight networks, Fast-Unet and BiseNet, Mobile-Unet has achieved the higher segmentation accuracy on the test set, with a pixel accuracy of 91.15%, mean pixel accuracy of 83.34%, and mean intersection over union of 70.51%. Compared with U-Net, the recognition accuracy was slightly reduced, but the complexity of the model was significantly reduced, with a 92.17% decrease in the memory occupation, and the inference speed of nearly 10 times faster. Additionally, the tests were conducted on 100 test set images for navigation line extraction. A total success rate of 91% was achieved for the extraction. The average deviation of yaw angle was 2.6° and 6.7°, respectively, to extract the navigation line using road contour and tree trunk features. The accuracy requirements were fully met in the field navigation. The finding can offer a valuable reference to explore the visual navigation in green Sichuan pepper fields.