Abstract:
Vegetables have the largest planting areas besides grains. Machine vision navigation has been one of the most crucial indicators of mechanization, automation, and intelligence in modern agriculture. Most vegetable transplanters are still manually driven at present. It is also necessary to detect the ridge mounds before the navigation of the transplanter. Since the ridge mounds are often free of crops before transplanting, it is still challenging to find references. Therefore, it is a high demand to extract the navigation lines with crop-free ridges mounds under complex scenes. There was also similar color information and small texture difference in crop-free ridges rows. Traditional image processing cannot fully meet large-scale production. In this study, a ridge row segmentation model was proposed using an improved version of DeepLabV3+. The real-time performance of semantic segmentation was also achieved with the high applicability, accuracy, and detection speed. The traditional DeepLabV3+ network was simplified to replace the Xception backbone network with the MobileNetV2 network. The speed of detection and the real-time performance were obtained after that. The DeepLabV3+ model incorporated the Convolutional Block Attention Module attention mechanism, in order to better treat the ridge boundary information. The important details of the ridge boundary were focused to accurately detect and classify the target objects. Navigational feature points were obtained using the ridge boundary information. In cases where the seedlingless ridges were present, the navigational feature points were deviated from the intended positions. Accordingly, the feature points were adjusted for the guidance of accurate navigation. The quartiles were utilized to filter out any outliers among the navigation feature points. Any data points were identified and removed to deviate significantly from the norm. In addition, the least squares method was used to fit the navigation line using the fitted feature points. A reliable reference of the navigation line was then obtained to compensate for any deviations from the seedlingless ridges. Overall, the simplified DeepLabV3+ network with the MobileNetV2 backbone was incorporated with the CBAM attention mechanism. There were the high detection speed, real-time performance and accurate navigations, even in the challenging scenarios with the ridge boundaries. Two locations were also selected from the images, in order to improve the applicability of the model in the environments of crop-free ridges. The challenge remained on the different soil qualities, lighting conditions and seedlingless ridges in the field test. The dataset consisted of 1 350 images in the training set, and 150 images in the validation set. The images were then expanded using data enhancement. The results indicate that the improved model was achieved with the mean pixel accuracy of 96.27%, the mean intersection and merger ratio of 93.18%, and an average detection frame rate of 84.21 frames per second. The mean intersection over union and mean pixel accuracy of MobileNetV2 model the accuracy is improved by 1.78 and 0.83 percentage points compared to the original model, and the frame rate increase by 29.83 frames per second, while the MobileNetV3 model achieves the mean intersection over union and mean pixel accuracy decreased by 0.83 and 3.28 percentage points respectively, while the frame rate increased 15.47 frames per second. Furthermore, the improved model also demonstrated better average accuracy, average intersection ratio, and frame rate than PSPNet, U-Net, HRNet, Segformer, and DeepLabV3+. The Hough transform and Random Sample Consensus were much less effective in obtaining the navigation lines from different scenes, compared with the maximum angular error of 1.2° and the maximum pixel error of 9 pixels in various ridges environments. These findings can serve as a strong reference for crop-free ridge navigation in agricultural robots, thus promoting the development of intelligent agricultural equipment.