Visual navigation method and field experiment of cotton spraying robots
-
摘要:
视觉导航是田间作业机器人的主流导航方法,为解决因棉苗稀疏、缺苗和杂草等因素导致导航路径提取困难这一问题,该研究建立了基于改进RANSAC(random sample consensus)算法和最小二乘拟合的视觉导航方法,并开展田间视觉导航路径跟踪试验。首先获取棉田前视导航图像,利用改进超绿灰度特征变换提高作物行与背景的区分度,结合自适应阈值分割方法将作物行从背景中分割出来并进行形态学滤波去噪得到二值化图像;然后根据图像中作物行目标区域的分布特征,利用改进RANSAC算法去除离散点,并对最优特征点聚类,以保证最终提取的作物行中心线的准确性,最后采用最小二乘法拟合得到导航路径。试验结果表明,传统RANSAC方法的行识别率为96.5%,平均误差角为1.41°,单幅图像平均耗时0.087 s;改进RANSAC方法去除离群点后的行识别率为98.9%,平均误差角为0.53°,导航路径提取性能得到大幅提升。利用自主开发的施药机器人及视觉导航系统在棉花苗期开展田间路径跟踪试验,施药机器人在3种不同初始状态和3种不同运动速度下自主导航行驶,最大横向偏差不超过2.59 cm,且无轧苗现象发生,满足“一膜三垄六行”种植模式下的自主导航作业要求,研究结果可为其他农业机器人的自主导航方法开发提供参考。
Abstract:Visual navigation of robots has been widely used to extract information from their surroundings to determine subsequent activity in agricultural fields. However, the navigation accuracy has been limited to the very complex scene in the field, particularly under light variation and plant growth. It is also difficult to extract the crop path using sparse cotton seedlings, missing seedlings, and weeds in the cotton seedling stage scenario. In this study, a visual navigation method was established using improved RANSAC (random sample consensus). A series of experiments was also carried out on navigation path tracking in the cotton field. Firstly, the images were captured at multiple growth stages during cotton seedlings using agricultural robot with a camera. Then the crop rows and background were distinguished to fully separate by adaptive threshold segmentation. The binary images were also denoised using morphological filtering. According to the region of interest, the outlier points beyond the crop row were removed from the image by the improved RANSAC algorithm. The detection and clustering of feature points were carried out to ensure the accuracy of the final extracted center line of the crop row. Finally, the navigation path was obtained after the least square fitting. The experimental results show that the better fitting path was obtained after the removal of the outliers using improved RANSAC, which was in line with the actual position for the center line of the crop row. Specifically, the line recognition rate was 96.5% using the traditional RANSAC algorithm, the average error angle was 1.41°, and the average time of image processing was 0.087 s. By contrast, the line recognition rate increased to 98.4% after removing outliers by the improved RANSAC algorithm, and the average error angle was only 0.53°. The performance of center line extraction was significantly improved using a modified RANSAC algorithm, compared with the original. In addition, a comparison was also made between the improved and traditional Hough transform. The effectiveness of the improved model was then verified to extract the navigation path. The spraying robot with visual navigation was self-developed to better validate the practical application of this improved model in a complex environment. The path-tracking experiments were then conducted autonomously in the field of cotton seedling. Three initial states and three moving speeds were selected, including 0.4, 0.5, and 0.6 m/s. Image processing was realized using OpenCV with robot operating system (ROS). Furthermore, an open-source library was also configured during image processing using machine vision. A path-tracking algorithm was utilized to improve the tracking accuracy using adaptive sliding-mode control. Among them, the maximum lateral deviations of the robot were 1.53, 2.29, and 2.59 cm, respectively, when the speed values were 0.4, 0.5, and 0.6 m/s, respectively. There were no rolling failures, fully meeting the precision requirements of the application robot for the line operation in the cotton field under the planting mode of "1 film, 3 ridges, and 6 rows". Visual navigation can also provide the theoretical support and technical basis for the autonomous navigation and mobile operation of agricultural robots on the farm.
-
Keywords:
- visual navigation /
- robot /
- cotton field /
- line recognition /
- improved RANSAC /
- path fitting /
- tracking experiment
-
0. 引 言
中国是全球最大的桃生产国,2021年中国桃子种植面积和产量分别占全球的54.83%和64.08%,桃产业总产值近千亿元[1]。然而,桃树缩叶病是桃树的主要病害之一,其主要发生在嫩叶,严重时会危害嫩枝、花朵和果实,造成桃树减产,甚至影响桃树次年产量。桃树缩叶病的快速和准确的识别,对防治措施具有重要意义。目前桃树缩叶病一般是果农依靠经验识别,这种方式不仅识别准确率和效率低,还会受到果农主观因素影响。因此,桃树缩叶病发病情况的智能检测是桃产业急需实现的目标。
近年来随着计算机硬件技术的发展,基于深度学习的目标检测方法被广泛应用于农业领域的识别任务中[2-4]。王磊磊等[5]在YOLOv5的模型中添加注意力模块,并改进损失函数构建了OMM-YOLO模型用于平菇目标检测,改进模型对各个成熟度的平菇检测精度均有提高。WANG等[6]提出了一种基于两阶段移动视觉的级联害虫检测方法,该方法将作物分类模型作为预训练模型训练害虫检测,通过预先对作物种类进行区分再进行害虫识别解决物种间图像数据不平衡的问题。LI等[7]提出了一种基于改进YOLOv5的蔬菜病害检测方法,该方法采用多尺度特征融合策略提高了特征提取能力,减少了由于复杂背景造成的漏检和误检。兰玉彬等[8]在YOLOv5s模型中加入CA(coordinate attention,CA)注意力减少小目标特征丢失的情况,并引入GhostNet网络中的Ghost模块轻量化网络,实现了对自然场景下生姜叶片病虫害识别准确率的提升。SU等[9]在YOLOv5模型中加入SE(squeeze-and-excitation,SE)解决了传统目标检测模型无法有效筛选出芸豆褐斑病的关键特征的问题。XIAO等[10]利用增强的YOLOv7和边缘计算对荔枝病害进行实时轻量级检测,相比原始YOLOv7模型检测速度更快,模型参数更少。HOU等[11]在Faster-RCNN网络中引入特征金字塔结构和ISResNet(inception Squeeze-and-Excitation ResNet)构建了一种深度学习模型,该模型在检测苹果叶片病害方面具有较高的准确性和通用性。李颀[12]采用基于回归的轻量型检测算法RFBNet(reception field block network,RFBNet)检测桃树的病虫害,其方法识别速度快,对桃树缩叶病的识别精度达到了76.58%。YANG等[13]提出了一种基于YOLOv8s深度学习算法并结合LW-Swin Transformer模块的LS-YOLOv8s的草莓成熟度检测和分级模型,增加了Swin Transformer模块,利用多头自注意力机制提高模型的泛化能力。ZHANG等[14]提出了一种基于深度学习的苹果病害检测算法,设计了一种Bole卷积模块(Bole convolution module,BCM)减少苹果叶病图像特征提取过程中的冗余特征信息,提出了一种交叉注意力模块(cross-attention module,CAM)降低图像背景信息对模型的干扰。上述研究在农作物病害检测等方面取得了较好的效果,但仍存在如下问题:
多数研究仅在单一叶片上进行,对实际场景中出现的遮挡、光线等影响因素考虑不足;多数研究仅对病害进行分类,并未区分病害发展情况;针对桃树缩叶病识别的研究较少。
针对以上问题,本文以果园中桃树为目标,采集并搭建桃树缩叶病目标检测数据集。以YOLOv5su为基础模型,通过引入可变形自注意力机制、可分离大核卷积注意力和轻量化权重自适应下采样模块提高模型对特征的拟合能力,提出了一种基于改进YOLOv5su的桃树缩叶病检测模型DLL-YOLOv5su,为实现桃树缩叶病发生的实时监测提供一种有效方法。
1. 图像采集与数据集搭建
1.1 图像采集
本文的桃树缩叶病图像采集地点为中国重庆市农业科学院桃树种植基地,以“紫金红”油桃树为主要研究对象。采集时间为2023年4—6月,采集设备与方式分别为数码相机Canon EOS M50 m2拍摄照片和手机SAMSUNG SM-G9910拍摄视频,设备距离目标20~80 cm,拍摄不同时段和视角下的桃树缩叶病图像,包含从桃树缩叶病发生早期到后期的各阶段图像。拍摄得到291张分辨率为6 000×4 000像素的桃树图像和5分38秒视频,视频帧率为60帧/s,分辨率为3 840×2 160像素。数码相机采集到的图像分辨率高,且包含的内容多,故对291张高分辨率图像进行裁剪,筛选出包含缩叶病目标的图像1 358张。对视频进行抽帧和筛选,剔除过于模糊和不包含目标的背景图片得到162张桃树缩叶病图像。合并两个图像集得到1 520张桃树缩叶病图像。
1.2 数据集搭建
采用轻量级图像注释工具“LabelImg”将采集到的桃树缩叶病图像按照YOLO格式标注,并以8:2:1的比例随机划分为训练集1 105张、验证集277张和测试集138张。由于数据集均于同一场景中拍摄,为避免过拟合,提高模型鲁棒性和泛化能力,采用添加高斯噪声、改变亮度、Cutout、旋转、翻转、平移等图像处理方法随机组合对训练集进行数据增广,扩充后的训练集包含6 630张图片。
如图1所示,在桃树缩叶病数据集中,每一张图片均包含病害目标,且病害分布随机,包含多种光照强度与方向。其中早期缩叶病标签数一共为19 419,中期缩叶病标签数为11 172,后期缩叶病标签数为10 069。
2. 桃树缩叶病识别算法与改进
2.1 YOLOv5su网络
YOLO是目前流行的目标检测和图像分割模型之一,由华盛顿大学的Joseph Redmon和Ali Farhadi开发,因其高速度和高精度的优点而迅速流行起来。YOLOv5主要由输入(Input)、骨干(Backbone)、颈部(Neck)和头部(Head)组成。
YOLOv5u[15]在YOLOv5模型的基础架构上,集成了解耦头部结构,将分类任务与回归任务解耦。同时采用了无需预定义锚框匹配真实框的Anchor-free策略避免预定义锚框尺寸不合理的问题。YOLOv5u根据网络深度和宽度由小到大分为YOLOv5nu、YOLOv5su、YOLOv5mu、YOLOv5lu、YOLOv5xu五种不同结构。
2.2 改进YOLOv5su
本研究的试验图像皆取自室外自然环境中的桃树缩叶病发病区域。为了减轻光照和复杂背景环境等因素的不利影响,提高目标检测模型的鲁棒性,对YOLOv5su算法进行了改进,构建了DLL-YOLOv5su。具体改进措施如下:
1)在骨干网络最后一层C3模块的Bottleneck结构中加入Version Transformer [16]的可变形自注意(DA)模块[17]组成新的C3-DA模块。
2)在快速空间金字塔池化模块(SPPF)中加入了Version Attention Network [18]中的可分离大核卷积注意力模块[19]构成SPPF-LSKA模块。
3)在感受野注意力卷积(receptive-field attention convolution,RFAConv)[20]的基础上提出了一种轻量化权重自适应下采样模块(lightweight adaptive weighted downsampling,LAWD)替换部分初始网络结构中的CBS模块。
改进后的DLL-YOLOv5su模型结构如图2所示。
图 2 DLL-YOLOv5su模型结构图注:Upsample为上采样模块,Concat为从通道维度进行特征图拼接,Conv2 d为卷积层,BN表示批量归一化层,SiLU为激活函数,Bbox.Loss为定位损失,Cls.Loss为分类损失。Figure 2. DLL-YOLOv5su model structure diagramNote: Upsample is the upsampling module, Concat is the feature map concatenation from the channel dimension, Conv2 d is the convolutional layer, BN represents the batch normalization layer, SiLU is the activation function, Bbox.Loss is the regression loss, and Cls.Loss is the classification loss.2.2.1 可变形自注意力机制
自注意力最早在自然语言处理模型Transformer[21]中被提出。视觉转换器(version transformer,ViT)将自注意力机制用于视觉领域,构造了多头注意力模块(multi-head self-attention)。自注意力有三个重要元素:查询(query)、键(key)和值(value)。其输出式如下:
\text{Attention(}\boldsymbol{q}{,}{}\boldsymbol{k}\text{,}{}\boldsymbol{v}\text{)=Softmax(}\boldsymbol{q}{\boldsymbol{k}}^{\text{T}}/\sqrt{{{d}}_{\boldsymbol{k}}}\text{)}\boldsymbol{v} (1) 式中q、k和v分别表示查询向量、键向量和值向量, {d}_{k} 表k的维度,Softmax表示归一化。
自注意力可以捕获全局信息,并建立特征通道和目标之间的长期依赖关系。然而简单地扩大感受野,会导致内存和计算资源的浪费,并且特征会受到背景区域影响[17]。为了有效地建模和捕获全局语义信息,在主干网络最后一层C3模块中加入可变形自注意力模块强化缩叶病目标信息,抑制背景信息。可变形自注意力使用多组可变形的采样点确定特征图的重要区域,并以此重要区域为依据对桃树缩叶病不同特征之间的关系进行建模。这种灵活的方案使变形自注意模块能够专注于相关区域并捕获更多信息特征。可变形自注意力结构如图3所示。
图 3 变形注意力模块与偏移网络结构图注:q表示查询向量, \tilde{\boldsymbol{k}} 和 \tilde{\boldsymbol{v}} 分别表示变形的键和值向量, {\boldsymbol{W}}_{\boldsymbol{q}}{{、}\boldsymbol{W}}_{\boldsymbol{k}}\mathrm{、}{\boldsymbol{W}}_{\boldsymbol{v}}\mathrm{和}{\boldsymbol{W}}_{\boldsymbol{o}} 分别表示查询向量、键向量、值向量和输出向量的投影矩阵,z表示输出特征图,H、W和C分别表示图像高度、宽度和通道数,r为下采样缩减比例,GroupConv为组卷积,LayerNorm为层归一化,GELU为激活函数。Figure 3. Deformable attention module and offset network structure diagramNote: q is the query vector, \tilde{\boldsymbol{k}} and \tilde{\boldsymbol{v}} represent the deformed key and value vectors, \tilde{\boldsymbol{x}} represents the feature map sampling result, {\boldsymbol{W}}_{\boldsymbol{q}}{{,}\boldsymbol{W}}_{\boldsymbol{k}}{,}{\boldsymbol{W}}_{\boldsymbol{v}}\;{{\mathrm{and}}}\;{\boldsymbol{W}}_{\boldsymbol{o}} denote the projection matrices of the query vector, key vector, value vector and output vector respectively, z is the output feature map, H, W and C represent the image height, width, and number of channels, r is the downsampling reduction ratio, GroupConv is a group convolution, LayerNorm is a layer normalization, and GELU is an activation function.如图3a所示,变形注意力模块首先对输入特征图 \boldsymbol{x} 生成一个均匀的二维点网格 {p} ,网格中的点作为采样点。将特征图线性投影到查询向量,然后馈送到偏移子网络 \theta_{\text{offset}} 以生成偏移量 \Delta {p} ,使用此偏移量与采样点相加得到变形点。在变形点处利用双线性插值函数进行采样得到采样特征 \tilde{\boldsymbol{x}} 。其式为:
\boldsymbol{q}=\boldsymbol{x}{{W}}_{\boldsymbol{q}}{,}\;\;\tilde{\boldsymbol{k}}=\tilde{\boldsymbol{x}}{{W}}_{\boldsymbol{k}}{,}\;\;\tilde{\boldsymbol{v}}=\tilde{\boldsymbol{x}}{{W}}_{\boldsymbol{v}} (2) \Delta {p}\text=\theta_{\text{offset}}{(}\boldsymbol{q}\text{)} (3) \tilde{\boldsymbol{x}}=\phi\text{(}\boldsymbol{x}\text{;}{p}\text{+}\Delta {p}\text{)} (4) \phi {(}\boldsymbol{z}{;(}{{p}}_{{x}}{,}{{p}}_{{y}}{))}={\sum }_{{(}{{r}}_{{x}}{,}{{r}}_{{y}}{)}}{g}{(}{{p}}_{{x}}{,}{{r}}_{{x}}{)}{g}{(}{{p}}_{{y}}{,}{{r}}_{{y}}{)}\boldsymbol{z}{[}{{r}}_{{y}}{,}{{r}}_{{x}}{,:]} (5) 式中 {\boldsymbol{W}}_{\boldsymbol{q}}{{、}\boldsymbol{W}}_{\boldsymbol{k}}{、}{\boldsymbol{W}}_{\boldsymbol{v}} 分别为查询向量、键向量和值向量的投影矩阵, {g}{(}{a}{,}{}{b}{)}{=}{{\mathrm{max}}(0,1-}{|}{a}{-}{b}{|}{)} ( {{r}}_{{x}}{,}{{r}}_{{y}} )索引\boldsymbol{z}\in {{R}}^{{H\times W\times C}} 上的所有位置。由于 {g}{(}{a}{,}{}{b}{)} 仅在最接近 {(}{{p}}_{{x}}{,}{{p}}_{{y}}{)} 的4个积分点上非零,可以简化式(5)为
\begin{split} & {\boldsymbol{z}}^{{(}{m}{)}}= {\sigma}\left(\frac{{\boldsymbol{q}}^{{(}{m}{)}}{\tilde{\boldsymbol{k}}}^{{(}{m}{)T}}}{\sqrt{{d}}}+\phi {(}\widehat{{B}}{;}{R})\right){\tilde{\boldsymbol{v}}}^{{(}{m}{)}}\\ & {m=}{1,…,}{M}{.} \end{split} (6) \boldsymbol{z}\text{=Concat(}{\boldsymbol{z}}^{\text{(1)}}\text{)}{\boldsymbol{W}}_{\boldsymbol{o}} (7) 式中 {\boldsymbol{z}}^{{(}{m}{)}} 表示第m个头部输出向量, \sigma (·)表示Softmax函数,M表示多头注意力头部数量,d=C/M表示每个头部的维度, {\boldsymbol{q}}^{{(}{m}{)}}{{、}\,\tilde{\boldsymbol{k}}}^{{(}{m}{)}}{、}\,{\tilde{\boldsymbol{v}}}^{{(}{m}{)}}\in {{R}}^{{N}{\times}{d}} 分别表示查询、键和值向量的对应头的元素, {\boldsymbol{W}}_{\boldsymbol{o}} 表示输出向量投影矩阵。 \phi {(}\widehat{{B}}{;}{R}{)}\in {{R}}^{{HW}{\times}{HW}} 对应于Swin Transformer[22]中的位置嵌入。将每个头部的特征连接在一起并通过 {\boldsymbol{W}}_{\boldsymbol{o}} 投影得到输出z。将变形注意力模块加入到C3模块中组成C3-Deformable attention模块和Bottleneck-DA模块结构,如图4所示。
2.2.2 可分离大核卷积注意力机制
大卷积核注意力(large kernel attention, LKA)模块[18]在VAN中发挥了卓越的性能,使得VAN在一系列基于视觉的任务中的表现超过了ViT和CNN等模型。LKA吸收了卷积和自注意力的优点,包括局部结构信息、长程依赖和适应性,同时避免了通道维度上忽略适应性等缺点。大核卷积模块包括3个部分:深度卷积(depth-wise convolution, DWC)、深度扩张卷积(depth-wise dilation convolution, DWDC)和通道卷积(1×1 convolution)。LKA模块可以表示为
{\mathrm{LKA}}(\boldsymbol{x})={{C}}^{\text{1×1}} ({\mathrm{DWD}}\_{\mathrm{C}}({\mathrm{DW}}\_{\mathrm{C}}( \boldsymbol{x} ))) (8) 式中 \boldsymbol{x} 是输入特征图, {\mathrm{LKA}}(\boldsymbol{x}) 是注意力输出特征图, {{C}}^{{1\times 1}} 表示1×1卷积,DW_C(·)表示深度卷积,DWD_C(·)表示深度扩张卷积。
LSKA将LKA中的深度卷积层的二维卷积核分解为级联的水平和垂直一维卷积核。与标准的LKA相比,LSKA可以提供相当的性能,并且计算复杂度和内存占用更低。为了提高SPPF的特征信息获取能力,在SPPF层中引入可分离大卷积核注意力,构成SPPF-LSKA层,其模型如图5所示。
图 5 SPPF-LSKA 模块结构图注: MaxPool2d为最大池化层,DW-Conv为深度卷积,DW-D-Conv为深度扩张卷积,k表示卷积核,s表示步长,p表示像素填充,d2表示卷积核元素间隔为1。Figure 5. SPPF-LSKA module structure diagramNote: MaxPool2 d is the maximum pooled layer, DW-Conv is the deep-wise convolution, DW-D-Conv is the deep-wise dilation convolution, k represents the convolution kernel, s represents the step size, p represents the pixel filling, and d2 represents the convolution kernel element interval of 1.2.2.3 轻量化权重自适应下采样
标准卷积运算提取特征图是通过卷积核与相同尺寸的感受域相乘然后求和得到。每个感受域相同空间位置处的特征共享相同的卷积核参数。因此,标准卷积操作没有考虑不同位置所包含的信息差异,这在一定程度上限制了卷积神经网络的性能。ZHANG等[23]观察到组卷积可以减少模型的参数和计算开销,但是组内信息之间交互不足会影响网络性能;HUANG等[24]通过重用特征进行融合来改进特征信息,以解决网络梯度消失的问题;DAI等[25]提出了可变形卷积,通过学习偏移来改变卷积核的采样位置,进一步提高了卷积神经网络的性能。ZHNAG等[20]提出了感受野注意卷积(receptive-field attention convolution, RFAConv),该方法动态地确定每个特征在感受野中的权重。
本文为减少标准卷积的卷积核参数共享导致的空间信息损失与轻量化模型,在RFAConv的基础上提出轻量化权重自适应下采样(lightweight adaptive weight downsampling, LAWD)替换模型中的部分卷积模块。其结构如图6所示。
图 6 轻量级权重自适应下采样结构图注: AvgPool为平均池化层,Rearrange为张量维度调整,K1~KC表示第1~C的卷积核,Sum表示求元素和。Figure 6. Lightweight adaptive weight downsampling structure diagramNote: Avgpool is the average pooling layer, Rearrange is the tensor dimension adjustment, K1-KC represents the 1st to C convolution kernel, and Sum is the element sum.LAWD上半部分为权重路线,首先采用的平均池化聚合每个感受域信息,再通过1×1卷积交互不同通道间的空间信息,提高网络性能[26-28]。对得到的权重矩阵进行维度转换,这样在既不损失权重信息的前提下可以有效减少网络的参数量。
下半部分为特征提取路线,首先通过组卷积,对输入特征图进行切片操作,将输入特征图的W、H维度信息保存到通道中,再进行卷积操作得到的二倍下采样特征图可以很好地保留原始图像信息。相较于普通卷积,采用分组大小为16的分组卷积可以将参数量减少到普通卷积的1/16。再对特征图进行维度变换,并与权重相乘。最后对第四个维度进行求和得到最终下采样的特征图。LAWD可以通过廉价的操作为不同的感受域生成权重矩阵,避免了卷积操作的卷积核参数共享问题,并且减少了模型参数量。LAWD计算过程可以表示为
\begin{split} {\mathrm{LAWD}}(\boldsymbol{x})=\,& {\mathrm{Sum}}({\mathrm{Softmax}}( {{C}}^{{1 \times 1}} ({\mathrm{Avgpool}}( \boldsymbol{x} )))\\ \,& \times {\mathrm{SiLU}}({\mathrm{BN}}( {{g}}^{\text{3×3}} ( \boldsymbol{x} )))) \end{split} (9) 式中x表示输入特征图, {{g}}^{\text{3×3}} 表示3×3组卷积。
改进后的DLL-YOLOv5su桃树缩叶病检测整体网络结构中,LAWD模块替换了骨干网络后三层CBS模块和颈部网络的两层下采样卷积。
2.3 试验环境与评价指标
2.3.1 试验环境设置
本文研究中针对桃树缩叶病数据集的训练和测试的试验平台环境为:操作系统为64位Windows11,处理器型号为12 th Gen Inter(R) Core(TM)i5-12400F 2.5 GHz,显卡型号为Nvidia GeForce RTX 4060Ti(8G),内存为32GB(
3200 MHz),深度学习框架为PyTorch 2.0, CUDA版本为11.7,编程平台为PyCharm,编程语言为Python3.11。本文所有试验均在相同环境下进行,选择YOLOv5su作为桃树缩叶病检测的原始模型,为避免过高的学习率导致训练精度震荡,模型收敛困难,超参数文件选择low版本。训练采用的批量大小设置为8,初始学习率为0.01,动量设置为0.937,训练图片尺寸为640×640像素,训练轮数设置为300轮。
2.3.2 评价指标
本文主要以准确率(precision, P)、召回率(recall, R)、平均检测精度(average precision, AP),平均检测精度均值(mean average precision,mAP)、每秒帧数(frames per second,FPS)和权重大小(MB)作为评价指标。
3. 结果与分析
3.1 基线模型选择
YOLOv5su是在YOLOv5s基础上提出的,为了选择合适的基线模型,本文对部分版本进行了对比试验,试验结果如表1所示。
表 1 基线模型对比结果Table 1. Baseline model comparison results模型
Models准确率
Precision P/%召回率
Recall R/%平均精度均值
Mean average precision/%权重大小
Model size/MB帧速率
Frame per second
FPS/(帧∙ {\mathrm{s}}^{-1} )平均精度
Average precision AP/%mAP50 mAP50~95 早期
Early中期
Mid后期
EndYOLOv5nu 75.3 68.9 73.8 44.3 7.6 105.7 75.1 76.2 70.1 YOLOv5su 76.5 70.7 76.1 46.8 18.5 119.7 78.3 75.3 74.6 YOLOv5 mu 77.1 72.4 78.6 48.0 50.5 78.0 79.9 80.2 75.6 YOLOv5s 74.4 69.6 73.0 44.8 14.5 83.0 78.0 70.1 70.7 注: mAP50为IoU阈值为0.5时的平均精度均值,mAP50~95为IoU阈值为0.5到0.95,步长为0.05时mAP的平均值。 Note: mAP50 is the average precision when the intersection over union threshold is 0.5, mAP50~95 is the average value of mAP for IoU thresholds of 0.5 to 0.95 in steps of 0.05. 由表1结果可以看出,YOLOv5su模型检测效果优于YOLOv5nu,模型权重大小为18.5 MB,远小于YOLOv5 mu的50.5 MB。对比YOLOv5s,YOLOv5su在准确率、召回率和平均检测精度方面均优,模型增加了4 MB。因此,引入无锚框的分体式头部结构有效地提升了模型性能。综合以上因素,本文选择YOLOv5su作为基线模型。
3.2 消融试验结果
为了验证本文改进和提出的方法对于桃树缩叶病检测是否有效,以及评估各个方法所达到的效果,设计了消融试验。试验结果如表2所示,模型改进训练过程参数如图7所示。
表 2 消融试验结果Table 2. Ablation test results模型
Models准确率
Precision P/%召回率
Recall R/%平均精度均值
Mean average precision/%权重大小
Model size/MB帧速率
Frame per second
FPS/(帧∙ {\mathrm{s}}^{-1} )平均精度
Average precision AP/%mAP50 mAP50~95 早期
Early中期
Mid后期
EndYOLOv5su 76.5 70.7 76.1 46.8 18.5 119.7 78.3 75.3 74.6 YOLOv5su+C3-DA 76.7 74.2 78.4 48.3 19.1 96.1 79.9 80.4 75.1 YOLOv5su+LSKA 79.0 71.0 78.4 49.0 20.7 114.8 80.4 79.2 75.7 YOLOv5su+LAWD 76.9 69.7 77.8 47.8 15.0 96.5 79.1 78.5 75.9 DLL-YOLOv5su 80.7 73.1 80.4 50.4 17.6 83.0 80.6 81.5 79.1 由消融试验结果可以看出各个改进方法的效果:改进C3-DA模块之后,模型相较于初始模型召回率提高了3.5个百分点,平均精度均值mAP50提高了2.3个百分点,但是模型权重增大了0.6 MB;引入LSKA模块之后,模型在准确率和平均精度mAP50方面分别提高了2.5和2.3个百分点,模型权重增加了2.2 MB;使用LAWD替换部分CBS模块之后的模型召回率降低了1个百分点,但准确率和平均精度均值分别上升了0.4和1.7个百分点,模型大小减少了18.9个百分点。
由图7可以看出,改进后的DLL-YOLOv5su模型收敛速度稍慢,但在训练200轮左右时效果已经超越初始版本的YOLOv5su。最终本研究提出的DLL-YOLOv5su桃树缩叶病检测模型准确率、召回率和平均精度均值分别达到了80.7%、73.1%和80.4%。相比原YOLOv5su模型分别提升了4.2、2.4和4.3个百分点。检测速度为83.0帧/s,可以满足桃树缩叶病实时检测的要求。
3.3 目标检测模型对比试验
为验证本文提出模型的有效性,将其他主流目标检测模型与改进后的模型进行对比试验。所有模型均在相同试验环境下,对相同数据集进行试验,试验结果如表3所示。
表 3 对比试验结果Table 3. Contrast test results模型
Models准确率
Precision P/%召回率
Recall R/%mAP50/% 权重大小
Model size/MB平均精度
Average precision AP/%早期
Early中期
Mid后期
EndYOLOv5su 76.5 70.7 76.1 18.5 78.3 75.3 74.6 Faster R-CNN - - 51.9 110.9 31.5 62.2 62.1 YOLOv3-tiny 73.9 67.2 68.6 17.5 69.3 66.2 70.4 YOLOv7 80.1 74.6 78.3 74.8 82.4 78.5 74.0 YOLOv8s 80.0 70.6 76.3 22.5 78.0 77.9 73.1 DLL-YOLOv5su 80.7 73.1 80.4 17.6 80.6 81.5 79.1 由表3可以看出,YOLOv5su原始模型对比YOLOv7和YOLOv8,在桃树缩叶病识别平均精度均值接近的情况下,权重最小,证明了本文选择其作为基线模型的可行性。在对原始模型进行针对桃树缩叶病特征识别进行改进之后,改进的DLL-YOLOv5su模型平均精度均值mAP50相较于Faster R-CNN、YOLOv3-tiny、YOLOv7和YOLOv8分别高出28.5、11.8、2.1和4.1个百分点。召回率为73.1%,仅低于YOLOv7模型1.5个百分点,而准确率80.7%高于YOLOv7,因为对与目标检测任务,召回率与准确率难以同时兼顾。同时改进的DLL-YOLOv5su模型权重为17.6MB仅为YOLOv7的23.5%。
为进一步验证DLL-YOLOv5su对桃树缩叶病检测的效果,随机选择测试集图像对模型测试,测试结果如图8所示。从图中可以看出,对于桃树缩叶病早期的小目标检测任务,改进的DLL-YOLOv5su模型检测精度更高。Faster R-CNN和YOLOv7均出现了不同程度的误检;对于模糊目标,仅有改进的模型检测到了两个正确的目标,其余模型均仅检出一个目标或出现误检;对于遮挡的目标识别,YOLOv5su模型将桃树枝干误识别为缩叶病目标,而改进的DLL-YOLOv5su模型识别准确率高,且误检率低。
图 8 不同环境下各模型对桃树缩叶病识别效果注: 绿色方框处为正确检出,红色圆圈处为漏检,蓝色圆圈处为误检,黄色圆圈处为重复检出。Figure 8. Recognition effect of each model on peach leaf curl disease in different environmentsNote: The green boxes are correctly checked out, the red circle is missed, the blue circle is false, and the yellow circle is double-checked.试验结果表明,本文提出的DLL-YOLOv5su模型相较于主流目标检测模型,对不同发生时期的桃树缩叶病具有更好的识别效果;相较于文献[12]中的桃树病害检测模型,对桃树缩叶病的检测精度高出了3.82个百分点,且能够反映桃树缩叶病的发生情况。DLL-YOLOv5su模型能够为桃树缩叶病的针对性措施提供技术支持。
4. 结 论
本文通过对自然环境下采集的桃树缩叶病图像进行增强处理,搭建了桃树缩叶病数据集,提出了一种基于改进的YOLOv5su网络的目标检测模型DLL-YOLOv5su以提高桃树缩叶病在自然环境中的检测准确率。研究结果如下:
1)本文以YOLOv5su为基础网络,在其骨干网络的中引入可变形自注意力机制和可分离大内核注意模块,并在感受野卷积的基础上提出了权重自适应下采样模块替换网络中部分卷积模块,实现了在提高识别准确率和平均精度的前提下,轻量化模型。改进的DLL-YOLOv5su模型的准确率、召回率和平均精度均值mAP50分别达到了80.7%、73.1%和80.4%,相比初始模型分别提高了4.2、2.4和4.3个百分点。改进模型在桃树缩叶病3个发生时期的检测效果均高于初始模型,说明了改进方法的有效性。
2)通过对比试验与Faster R-CNN、YOLOv3-tiny、YOLOv7、YOLOv8s等主流目标检测模型进行比较,改进的DLL-YOLOv5su模型的准确率最高,平均精度均值mAP50分别高出28.5、11.8、2.1、4.1个百分点。与桃树病害检测模型相比,DLL-YOLOv5su精度也更高,进一步展示了DLL-YOLOv5su桃树缩叶病检测模型的优势。DLL-YOLOv5su模型能够识别自然环境下桃树缩叶病发生的3个阶段,对早期目标的识别效果提升可以更好地达到预警的效果,对于模糊和遮挡目标也具备一定推理能力。
-
表 1 不同方法的作物行中心线检测结果对比
Table 1 Comparison of crop line center line detection results by different methods
处理方法
Processing methods行识别率
Line recognition
rate/%平均误差角
Average error
angle/(°)单幅图像平均耗时
Average time per image/s未使用RANSAC 95.7 1.97 0.087 传统RANSAC 96.5 1.41 0.096 改进RANSAC 98.4 0.53 0.096 表 2 不同拟合方法的导航路径提取结果
Table 2 Navigation path extraction results of different fitting methods
拟合方法
Fitting methods场景
Scenarios行识别率
Line recognition rate/%平均误差角
average error angle/(°)单幅图像平均耗时
Average time per image/sHough变换 正常苗 87.62 2.49 0.329 少量缺苗 84.19 3.07 0.307 较多缺苗 73.43 3.82 0.279 本文方法 正常苗 100 0.41 0.112 少量缺苗 98.72 0.58 0.093 较多缺苗 97.37 0.72 0.079 -
[1] BAI Y H, ZHANG B H, XU N M, et al. Vision-based navigation and guidance for agricultural autonomous vehicles and robots: A review[J]. Computers and Electronics in Agriculture, 2023, 205.107584
[2] 赖汉荣,张亚伟,张宾,等. 玉米除草机器人视觉导航系统设计与试验[J]. 农业工程学报,2023,39(1):18-27. LAI Hanrong, ZHANG Yawei, ZHANG Bin, et al. Design and experiment of the visual navigation system for a maize weeding robot[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(1): 18-27. (in Chinese with English abstract)
[3] HAN S, ZHANG Q, NI B, et al. A guidance directrix approach to vision-based vehicle guidance systems[J]. Computers and Electronics in Agriculture, 2004, 43(3): 179-195. doi: 10.1016/j.compag.2004.01.007
[4] BAK T, JAKOBSEN H. Agricultural robotic platform with four wheel steering for weed detection[J]. Biosystems Engineering, 2004, 87(2): 125-136. doi: 10.1016/j.biosystemseng.2003.10.009
[5] ÅSTRAND B, BAERVELDT A J. A vision based row-following system for agricultural field machinery[J]. Mechatronics, 2006, 15(2): 251-269.
[6] WINTERHALTER W, FLECKENSTEIN F V, DORNHEGE C, et al. Crop row detection on tiny plants with the pattern hough transform[J]. IEEE Robotics & Automation Letters, 2018, 3(4): 3394-3401.
[7] ENGLISH A, ROSS P, BALL D. Vision based guidance for robot navigation in agriculture [Z]. 2014 IEEE International Conference on Robotics & Automation (ICRA). 2014.6907079
[8] OPIYO S, OKINDA C, ZHOU J, et al. Medial axis-based machine-vision system for orchard robot navigation[J]. Computers and Electronics in Agriculture, 2021, 195: 106153.
[9] ZHANG Q, SHAO M. E, LI B. A visual navigation algorithm for paddy field weeding robot based on image understanding[J]. Computers and Electronics in Agriculture, 2017, 143: 66-78. doi: 10.1016/j.compag.2017.09.008
[10] CHEN J Q, QIANG H, WU J H, et al. Extracting the navigation path of a tomato-cucumber greenhouse robot based on a median point Hough transform[J]. Computers and Electronics in Agriculture, 2020, 174: 105472. doi: 10.1016/j.compag.2020.105472
[11] 姜国权,柯杏,杜尚丰,等. 基于机器视觉的农田作物行检测[J]. 光学学报,2009,29(4):1015-1020. JIANG Guoquan, KE Xing, DU Shangfeng, et al. Crop row detection based on machine vision[J]. Acta Optica Sinica, 2009, 29(4): 1015. (in Chinese with English abstract)
[12] 孟庆宽,何洁,仇瑞承,等. 基于机器视觉的自然环境下作物行识别与导航线提取[J]. 光学学报,2014,34(7):1-7. MENG Qingkuan, HE Jie, QIU Ruicheng, et al. Crop recognition and navigation line detection in natural environment based on machine vision[J]. Acta Optica Sinica, 2014, 34(7): 1-7. (in Chinese with English abstract)
[13] 张漫,项明,魏爽,等. 玉米中耕除草复合导航系统设计与试验[J]. 农业机械学报,2015,46(S1):8-14. ZHANG Man, XIANG Ming, Wei Shuang, et al. Design and implementation of a corn weeding-cultivating integrated navigation system based on GNSS and MV[J]. Transactions of the Chinese Society for Agricultural Machinery, 2015, 46(S1): 8-14. (in Chinese with English abstract)
[14] 白如月,汪小旵,鲁伟,等. 施药机器人对行施药系统的设计与试验[J]. 华南农业大学学报,2018,39(5):101-109. BAI Ruyue, WANG Xiaochan, LU Wei, et al. Design and experiment of row-following pesticide spraying system by robot[J]. Journal of South Agricultural University, 2018, 39(5): 101-109. (in Chinese with English abstract)
[15] 廖娟,汪鹞,尹俊楠,等. 基于分区域特征点聚类的秧苗行中心线提取[J]. 农业机械学报,2019,50(11):34-41. LIAO Juan, WANG Yao, YIN Junnan, et al. Detection of seedling row centerlines based on sub-regional feature points clustering[J]. Transactions of the Chinese Society for Agricultural Machinery, 2019, 50(11): 34-41. (in Chinese with English abstract)
[16] 杨洋,张博立,查家翼,等. 玉米行间导航线实时提取[J]. 农业工程学报,2020,36(12):162-171. YANG Yang, ZHANG Boli, ZHA Jiayi, et al. Real-time extraction of navigation line between corn rows[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(12): 162-171. (in Chinese with English abstract)
[17] 宫金良,王祥祥,张彦斐,等 基于边缘检测和区域定位的玉米根茎导航线提取方法[J]. 农业机械学报,2020,51(10):26-33. GONG Jinliang, WANG Xiangxiang, ZHANG Yanfei, et al. Extraction method of corn rhizome navigation lines based on edge eetection and area localization[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(10): 26-33. (in Chinese with English abstract)
[18] 刘星星,张超,张浩,等. 最小二乘法与SVM组合的林果行间自主导航方法[J]. 农业工程学报,2021,37(9):157-164. LIU Xingxing, ZHANG Chao, ZHANG Hao, et al. Inter-row automatic navigation method by combining least square and SVM in forestry[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(9): 157-164. (in Chinese with English abstract)
[19] 王侨,孟志军,付卫强. 基于机器视觉的玉米苗期多条作物行线检测算法[J]. 农业机械学报,2021,52(4):208-220. WANG Qiao, MENG Zhijun, FU Weiqiang, et al. Detection algorithm of multiple crop row lines based on machine vision in maize seedling stage[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(4): 208-220. (in Chinese with English abstract)
[20] 傅灯斌,江茜,齐龙,等. 基于区域生长顺序聚类-RANSAC的水稻苗带中心线检测[J]. 农业工程学报,2023,39(7):47-57. FU Dengbin, JIANG Qian, QI Long, et al. Detection of the centerline of rice seedling belts based on region growth sequential clustering-RANSAC[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(7): 47-57. (in Chinese with English abstract)
[21] 肖珂,夏伟光,梁聪哲. 复杂背景下果园视觉导航路径提取算法[J]. 农业机械学报,2023,54(6):197-204,252. XIAO Ke, XIA Weiguang, LIANG Congzhe. Visual navigation path extraction algorithm in orchard under complex background[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(6): 197-204,252. (in Chinese with English abstract)
[22] 娄善伟,董合忠,田晓莉,等. 新疆棉花“矮、密、早”栽培历史、现状和展望[J]. 中国农业科学,2021,54(4):720-732. LOU Shanwei, DONG Hezhong, TIAN Xiaoli, et al. The "short, dense and early" cultivation of cotton in Xinjiang: History, current situation and prospect[J]. Scientia Agricultura Sinica, 2021, 54(4): 720-732. (in Chinese with English abstract)
[23] 李茗萱,张漫,孟庆宽,等. 基于扫描滤波的农机具视觉导航基准线快速检测方法[J]. 农业工程学报,2013,29(1):41-47. LI Mingxuan, ZHANG Man, MENG Qingkuan, et al. Rapid detection of navigation baseline for farm machinery based on scan-filter algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2013, 29(1): 41-47. (in Chinese with English abstract)
[24] HAMUDA E, GLAVIN M, JONES E. A survey of image processing techniques for plant extraction and segmentation in the field[J]. Computers and Electronics in Agriculture, 2016, 125: 184-199. doi: 10.1016/j.compag.2016.04.024
[25] 郭龙,秦三春,李强,等. 基于机器视觉的飞机模型冰形轮廓提取方法研究[J]. 自动化与仪器仪表,2020(6):15-20. GUO Long, QIN Sanchun, LI Qiang, et al. Research on ice contour extraction of aircraft model based on machine vision[J]. Automation and Instrumentation, 2020(6): 15-20. (in Chinese with English abstract)
[26] 王祥祥,宫金良,张彦斐. 基于机器视觉的玉米行导航线提取方法[J]. 山东理工大学学报(自然科学版),2021,35(2):19-27. WANG Xiangxiang, GONG Jinliang, ZHANG Yanfei. Extraction method of navigation line from corn line based on machine vision[J]. Journal of Shandong University of Technology (Natural Science Edition), 2021, 35(2): 19-27. (in Chinese with English abstract)
[27] 李秀智,彭小彬,方会敏,等. 基于RANSAC算法的植保机器人导航路径检测[J]. 农业机械学报,2020,51(9):40-46. LI Xiuzhi, PENG Xiaobin, FANG Huimin, et al. Navigation path detection of plant protection robot based on RANSAC algorithm [J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(9): 40-46. (in Chinese with English abstract)
[28] 陈弘,刘海,乔胜华,等. 基于三次样条插值的车辆行驶数据分析[J]. 汽车技术,2013,455(8):54-57. CHEN Hong, LIU Hai, QIAO Shenghua, et al. Analysis of vehicle driving data based on cubic spline interpolation[J]. Automobile Technology, 2013, 455(8): 54-57. (in Chinese with English abstract)
[29] 梁习卉子,陈兵旗,姜秋慧,等. 基于图像处理的玉米收割机导航路线检测方法[J]. 农业工程学报,2016,32(22):43-49. LIANG Xihuizi, CHEN Bingqi, JIANG Qiuhui, et al. Detection method of navigation route of corn harvester based on image processing[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(22): 43-49. (in Chinese with English abstract)
[30] JIANG G, WANG Z, LIU H. Automatic detection of crop rows based on multi-ROIs[J]. Expert Systems with Applications, 2015, 42(5): 2429-2441. doi: 10.1016/j.eswa.2014.10.033
-
期刊类型引用(3)
1. 杨麒. 北方严寒地区农田旱改水工程施工技术探析. 建筑施工. 2024(04): 502-506 . 百度学术
2. 贾新乐,石舟,李锐,张国海,耿端阳,兰玉彬,王伯龙. 丘陵山区小型农机底盘自动调平系统设计与试验. 农业机械学报. 2024(S1): 108-115 . 百度学术
3. 孔德航,张学东,崔巍,吴海华,孙星,王志伟,王春雷,宁义超. 基于模糊PID的顶夹式取苗装置苗盘定位控制方法. 农业机械学报. 2024(S1): 207-216+229 . 百度学术
其他类型引用(2)