改进DeepLabV3+算法提取无作物田垄导航线

俞高红; 王一淼; 甘帅汇; 徐惠民; 陈逸津; 王磊

doi:10.11975/j.issn.1002-6819.202401227

摘要: 机器视觉导航是智慧农业的重要部分，无作物田垄的导航线检测是旱地移栽导航的关键。针对无作物田垄颜色信息相近、纹理差距小，传统图像处理方法适用性差、准确率低，语义分割算法检测速度慢、实时性差的问题，该研究提出一种基于改进DeepLabV3+的田垄分割模型。首先对传统DeepLabV3+网络进行轻量化设计，用MobileNetV2网络代替主干网络Xception，以提高算法的检测速度和实时性；接着引入CBAM（convolutional block attention module，CBAM）注意力机制，使模型能够更好地处理垄面边界信息；然后利用垄面边界信息获得导航特征点，对于断垄情况，导航特征点会出现偏差，因此利用四分位数对导航特征点异常值进行筛选，并采用最小二乘法进行导航线拟合。模型评估结果显示，改进后模型的平均像素精确度和平均交并比分别为96.27%和93.18%，平均检测帧率为84.21帧/s，优于PSPNet、U-Net、HRNet、Segformer以及DeepLabV3+网络。在不同田垄环境下，最大角度误差为1.2°，最大像素误差为9，能够有效从不同场景中获取导航线。研究结果可为农业机器人的无作物田垄导航提供参考。

Abstract: Vegetables have the largest planting areas besides grains. Machine vision navigation has been one of the most crucial indicators of mechanization, automation, and intelligence in modern agriculture. Most vegetable transplanters are still manually driven at present. It is also necessary to detect the ridge mounds before the navigation of the transplanter. Since the ridge mounds are often free of crops before transplanting, it is still challenging to find references. Therefore, it is a high demand to extract the navigation lines with crop-free ridges mounds under complex scenes. There was also similar color information and small texture difference in crop-free ridges rows. Traditional image processing cannot fully meet large-scale production. In this study, a ridge row segmentation model was proposed using an improved version of DeepLabV3+. The real-time performance of semantic segmentation was also achieved with the high applicability, accuracy, and detection speed. The traditional DeepLabV3+ network was simplified to replace the Xception backbone network with the MobileNetV2 network. The speed of detection and the real-time performance were obtained after that. The DeepLabV3+ model incorporated the Convolutional Block Attention Module attention mechanism, in order to better treat the ridge boundary information. The important details of the ridge boundary were focused to accurately detect and classify the target objects. Navigational feature points were obtained using the ridge boundary information. In cases where the seedlingless ridges were present, the navigational feature points were deviated from the intended positions. Accordingly, the feature points were adjusted for the guidance of accurate navigation. The quartiles were utilized to filter out any outliers among the navigation feature points. Any data points were identified and removed to deviate significantly from the norm. In addition, the least squares method was used to fit the navigation line using the fitted feature points. A reliable reference of the navigation line was then obtained to compensate for any deviations from the seedlingless ridges. Overall, the simplified DeepLabV3+ network with the MobileNetV2 backbone was incorporated with the CBAM attention mechanism. There were the high detection speed, real-time performance and accurate navigations, even in the challenging scenarios with the ridge boundaries. Two locations were also selected from the images, in order to improve the applicability of the model in the environments of crop-free ridges. The challenge remained on the different soil qualities, lighting conditions and seedlingless ridges in the field test. The dataset consisted of 1 350 images in the training set, and 150 images in the validation set. The images were then expanded using data enhancement. The results indicate that the improved model was achieved with the mean pixel accuracy of 96.27%, the mean intersection and merger ratio of 93.18%, and an average detection frame rate of 84.21 frames per second. The mean intersection over union and mean pixel accuracy of MobileNetV2 model the accuracy is improved by 1.78 and 0.83 percentage points compared to the original model, and the frame rate increase by 29.83 frames per second, while the MobileNetV3 model achieves the mean intersection over union and mean pixel accuracy decreased by 0.83 and 3.28 percentage points respectively, while the frame rate increased 15.47 frames per second. Furthermore, the improved model also demonstrated better average accuracy, average intersection ratio, and frame rate than PSPNet, U-Net, HRNet, Segformer, and DeepLabV3+. The Hough transform and Random Sample Consensus were much less effective in obtaining the navigation lines from different scenes, compared with the maximum angular error of 1.2° and the maximum pixel error of 9 pixels in various ridges environments. These findings can serve as a strong reference for crop-free ridge navigation in agricultural robots, thus promoting the development of intelligent agricultural equipment.

改进DeepLabV3+算法提取无作物田垄导航线

Extracting the navigation lines of crop-free ridges using improved DeepLabV3+