• EI
    • CSA
    • CABI
    • 卓越期刊
    • CA
    • Scopus
    • CSCD
    • 核心期刊

基于改进BlendMask模型的苹果识别与定位方法

白晓平, 蔡皓月, 王卓, 张懿文

白晓平,蔡皓月,王卓,等. 基于改进BlendMask模型的苹果识别与定位方法[J]. 农业工程学报,2024,40(7):191-201. DOI: 10.11975/j.issn.1002-6819.202311200
引用本文: 白晓平,蔡皓月,王卓,等. 基于改进BlendMask模型的苹果识别与定位方法[J]. 农业工程学报,2024,40(7):191-201. DOI: 10.11975/j.issn.1002-6819.202311200
BAI Xiaoping, CAI Haoyue, WANG Zhuo, et al. Recognizing and locating apple using improved BlendMask model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(7): 191-201. DOI: 10.11975/j.issn.1002-6819.202311200
Citation: BAI Xiaoping, CAI Haoyue, WANG Zhuo, et al. Recognizing and locating apple using improved BlendMask model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(7): 191-201. DOI: 10.11975/j.issn.1002-6819.202311200

基于改进BlendMask模型的苹果识别与定位方法

基金项目: 国家重点研发计划项目(2021YFD2000305)
详细信息
    作者简介:

    白晓平,博士,副研究员,硕士生导师,研究方向为农业机器人。Email:baixiaopin@sia.cn

    通讯作者:

    王卓,研究员,博士生导师,研究方向为人工智能方法在材料、生物等基础科学领域的应用。Email:zwang@sia.cn

  • 中图分类号: S24

Recognizing and locating apple using improved BlendMask model

  • 摘要:

    针对实际自然环境中果实被遮挡、环境光线变化等干扰因素以及传统视觉方法难以准确分割出农作物轮廓等问题,该研究以苹果为试验对象,提出一种基于改进BlendMask模型的实例分割与定位方法。该研究通过引入高分辨率网络HRNet,缓解了特征图在深层网络中分辨率下降的问题,同时,在融合掩码层中引入卷积注意力机制CBAM(convolutional block attention module),提高了实例掩码的质量,进而提升实例分割质量。该研究设计了一个高效抽取实例表面点云的算法,将实例掩码与深度图匹配以获取苹果目标实例的三维表面点云,并通过均匀下采样与统计滤波算法去除点云中的切向与离群噪声,再运用球体方程线性化形式的最小二乘法估计苹果在三维空间中的中心坐标,实现了苹果的中心定位。试验结果表明改进BlendMask的平均分割精度为96.65%,检测速度34.51帧/s,相较于原始BlendMask模型,准确率、召回率与平均精度分别提升5.48、1.25与6.59个百分点;相较于分割模型SparseInst、FastInst与PatchDCT,该模型在平均精度变化不大的情况下,检测速度分别提升6.11、3.84与20.08帧/s,该研究为苹果采摘机器人的视觉系统提供技术参考。

    Abstract:

    Manual picking cannot fully meet the large-scale apple harvesting at present, due to the high labor intensity and cost with low efficiency. Fruit picking robot has drawn much attention in recent years, in order to realize the automatic picking and yield estimation. Among them, the vision system can dominate the efficiency and stability of the picking robot. It is required for the high speed and accuracy of fruit recognition under complex natural environments. Therefore, the vision system can be expected to accurately recognize the fruits on the tree. This study aims to identify and locate the apples in natural environments, particularly on the interference factors, such as blocked fruits, ambient light, viewing angle distance. Apple was also taken as the test object. Traditional vision was improved to accurately segment the contour of target fruit. Instance segmentation and localization were proposed using the improved BlendMask model. The original backbone network was replaced with the high-resolution network, HRNet (High-Resolution Net), in order to alleviate the decreasing resolution of feature maps in deep networks. Convolutional block attention mechanism (CBAM) was also introduced in the fusion mask layer of the instance segmentation model. The instance mask was thus improved the instance segmentation. Ablation experiments were carried out to verify a variety of popular instance segmentation backbone networks. HRNet was selected as the backbone network. BlendMask model was used to achieve the better performance balance between real-time and segmentation accuracy. At the same time, fruit recognition and localization were implemented to consider the recognition accuracy in real time. Therefore, the improved model was suitable for the fruit target recognition and localization. Instance segmentation was designed to efficiently extract the surface point cloud of the instance. The instance mask was matched with the depth map, in order to obtain the 3D surface point cloud of apple target instance. The tangential and outlier noises were removed in the point cloud using uniform downsampling and statistical filtering. Then the center coordinates of the apples were estimated in the 3D space using the least-squares method (LSM). The center of target localization was achieved in the form of linearization of spherical equations. Other geometric indicators were also be used in the localization framework to realize the center localization of different kinds of fruits. The experimental results show that the average segmentation accuracy of the improved BlendMask model was 96.65%, and the detection speed reached 34.51 frames/s. The average segmentation accuracy of improved BlendMask model was 96.65%. The accuracy, recall and average accuracy were improved by 5.48%, 1.25% and 6.59%, respectively; Compared with the current new instance segmentation models, SparseInst, FastInst and PatchDCT, the average accuracy of the model was slightly lagged behind by 0.29%, 0.04% and 1.94%, respectively, whereas, the detection speed was ahead by 6.11%, 3% and 3%, respectively. The detection speed was 6.11, 3.84 and 20.08 frames/s ahead, respectively. The improved BlendMask model shared the high segmentation accuracy in real time. The finding can provide a technical solution to the vision system in apple-picking robots.

  • 图  1   改进BlendMask模型推理流程示意图

    注: Interpolate表示双线性插值算法,ConvA与ConvB均表示卷积层。虚线框内是模型的改进部分,CBAM模块以残差结构的方式与ConvA相连。Base为模型初步语义分割后的结果,Atten为实例的大致分布张量,Rb为实例张量,Rb与Atten在Blender模块内完成融合计算并生成最终的实例掩码。

    Figure  1.   Schematic diagram of improved BlendMask model inference process

    Note: Interpolate denotes the bilinear interpolation algorithm, and both ConvA and ConvB denote convolutional layers. Inside the dashed box is the improvement part of the model, the CBAM module is connected to ConvA in the way of residual structure.Base is the result of the initial semantic segmentation of the model, Atten is the approximate distribution tensor of the instances, and Rb is the instance tensor, and Rb and Atten are completed within the Blender module to fusion computation and generate the final instance mask.

    图  2   HRNet网络结构

    Figure  2.   The structure of HRNet

    图  3   CBAM模型结构

    Figure  3.   The structure of CBAM

    图  4   改进后的Bottom Module网络

    注: Concat表示在通道方向进行特征图拼接,Upsample是使用双线性插方法的上采样模块。

    Figure  4.   Improved Bottom Module network

    Note: Concat denotes feature map splicing in the channel direction, and Upsample denotes the Upsample module using the bilinear interpolation method.

    图  5   宽度为3的深度图与点云输出数组的索引关系示意图

    注:深度图中的数字1至8表示像素索引,该索引与点云数组的一维索引d对应。点云数组中$ {{\boldsymbol{v}}}_{i} $表示索引为$ i $的点的三维空间向量。

    Figure  5.   Diagram of the index relation between a depth map with width 3 and a point cloud output array

    Note: The numbers 1 to 8 in the depth map denotes the pixel index which corresponds to the one-dimensional index d of the point cloud array. The $ {{\boldsymbol{v}}}_{i} $ in the point cloud array denotes the three-dimensional spatial vector of the point with index $ i $.

    图  6   点云处理算法演示图

    Figure  6.   Demonstration of point cloud processing algorithm

    图  7   图像增强示意图

    Figure  7.   Schematic diagram of image enhancement

    图  8   不同主干网络的损失变化图像

    Figure  8.   Loss variation images for different backbone networks

    图  9   不同BlendMask模型的权重热力图像

    Figure  9.   Weighted thermal images for different BlendMask models

    图  10   不同环境下不同模型的分割效果

    注:颜色相同则表示同一个苹果的表面像素,颜色不相同表示不同一个苹果的表面像素。

    Figure  10.   Segmentation effects of different models in different environments

    Note: The figure shows the effect of pixel segmentation of apple instances by different models in an RGB image. The models will pixel segment the surface of each apple in an RGB image, with the same color indicating the surface pixels of the same apple, and different colors indicating the surface pixels of a different apple.

    图  11   两种滤波算法的运行时间对比

    注: k表示近邻点数,r表示半径滤波算法的滤波半径参数,α表示统计滤波算法的标准差比率参数。

    Figure  11.   Comparison of running time of two filtering algorithms

    Note: k denotes the number of nearest neighbor points, r denotes the filter radius parameter of the radius filtering algorithm, and α denotes the standard deviation ratio parameter of the statistical filtering algorithm.

    图  12   两种滤波算法的效果对比图

    注:虚线与实线内的目标分别与该列中的绿色和蓝色点云相对应。

    Figure  12.   Comparison of the effects of the two filtering algorithms

    Note: Targets within the dashed and solid lines correspond to the green and blue point clouds in that column, respectively.

    表  1   不同Base通道数目的BlendMask模型性能对比

    Table  1   Performance comparison of BlendMask models with different number of Base channels

    Base通道数
    Number of Base channels
    参数量
    Parameters/MB
    检测速度
    Detection speed v/(帧·$ {{\mathrm{s}}}^{-1} $)
    均值平均精度
    Mean average precision mAP/%
    1 39.3 42.2 24.8
    2 47.9 36.8 27.3
    3 55.4 35.7 30.2
    4 61.7 34.6 33.5
    5 70.8 25.1 35.9
    6 79.1 18.4 37.7
    下载: 导出CSV

    表  2   不同主干网络的BlendMask模型性能对比

    Table  2   Performance comparison of BlendMask models for different backbone networks

    主干网络
    Backbone
    总参数量
    Total
    parameters /
    MB
    准确率
    Precision
    P/%
    召回率
    Recall R
    /%
    F1
    F1 score
    平均精度
    Average
    precision AP/%
    v /
    (帧·s-1)
    ResNet50 61.7 92.93 87.94 90.36 91.45 31.52
    VGG16 69.4 93.07 88.41 90.68 92.30 29.02
    EfficientNet 41.8 87.32 83.83 85.54 86.59 43.91
    MobileNetV2 40.9 89.68 84.09 86.79 87.78 45.72
    Vision Transformer 78.1 97.11 92.56 94.64 96.89 20.28
    Swin Transformer 77.5 97.36 93.73 95.46 96.91 21.69
    HRNet 63.6 96.75 91.10 94.62 95.83 33.98
    下载: 导出CSV

    表  3   不同结构的BlendMask模型性能对比

    Table  3   Performance comparison of BlendMask models with different structures

    模型结构形式
    Model structure form
    总参数量
    Total parameters /$ \mathrm{M}\mathrm{B} $
    P/% R/% $ {F}_{1} $ AP/%
    $ \mathrm{R}\mathrm{e}\mathrm{s}\mathrm{N}\mathrm{e}\mathrm{t}101 $ 134.3 94.78 89.81 92.22 91.21
    $ \mathrm{R}\mathrm{e}\mathrm{s}\mathrm{N}\mathrm{e}\mathrm{t}50 $ 61.7 92.06 88.76 90.37 90.06
    $ \mathrm{H}\mathrm{R}\mathrm{N}\mathrm{e}\mathrm{t} $ 63.6 96.42 90.82 93.53 95.44
    $ \mathrm{H}\mathrm{R}\mathrm{N}\mathrm{e}\mathrm{t}+\mathrm{C}\mathrm{B}\mathrm{A}{\mathrm{M}}_{1} $ 67.9 97.54 91.06 94.18 96.65
    $ \mathrm{H}\mathrm{R}\mathrm{N}\mathrm{e}\mathrm{t}+\mathrm{C}\mathrm{B}\mathrm{A}{\mathrm{M}}_{2} $ 64.5 96.32 90.34 93.23 96.01
    $ \mathrm{H}\mathrm{R}\mathrm{N}\mathrm{e}\mathrm{t}+\mathrm{C}\mathrm{B}\mathrm{A}{\mathrm{M}}_{1}+\mathrm{C}\mathrm{B}\mathrm{A}{\mathrm{M}}_{2} $ 68.6 97.69 92.38 94.96 96.98
    注: CBAM1表示在Bottom module网络里嵌入的CBAM模块,CBAM2表示以残差连接方式嵌入至ConvA网络的CBAM模块。
    Note: CBAM1 denotes the CBAM module embedded in the Bottom module network, and CBAM2 denotes the CBAM module embedded into the ConvA network as a residual connection.
    下载: 导出CSV

    表  4   不同实例分割模型的性能比较

    Table  4   Performance comparison of different instance segmentation models

    模型Models P/% R/% $ {F}_{1} $ AP/% v /(帧·$ {{\mathrm{s}}}^{-1} $)
    Mask R-CNN 87.89 84.01 85.90 86.48 11.32
    SOLOv2 92.57 86.44 89.40 89.21 26.56
    YOLACT 96.21 89.74 92.86 91.84 35.45
    SparseInst 97.82 92.45 95.06 96.94 28.40
    FastInst 97.76 90.31 93.69 96.69 30.67
    PatchDCT 99.67 92.42 95.43 98.59 14.43
    改进BlendMask Improved BlendMask 97.54 91.06 94.18 96.65 34.51
    下载: 导出CSV

    表  5   不同距离下的改进型BlendMask模型的定位效果

    Table  5   Localization effects of the improved BlendMask model at different distances

    距离
    Distance/$ \mathrm{m} $
    VMSE/$ \mathrm{c}{\mathrm{m}}^{3} $ DMSE/$ \mathrm{c}\mathrm{m} $ Vs/$ \mathrm{m}\mathrm{s} $
    0.5 5.32 1.63 31.23
    1.0 13.53 4.64 31.78
    1.5 17.98 5.76 32.12
    2.0 19.12 7.39 33.32
    2.5 25.41 8.45 32.45
    3.0 30.61 7.89 32.29
    均值Average value 18.66 5.96 32.19
    注:VMSE表示体积均方误差,DMSE表示测距均方误差,Vs表示识别定位计算速度。
    Note: VMSE denotes volumetric mean square error, DMSE denotes distance mean measurement error, Vs denotes recognition and localization calculation speed.
    下载: 导出CSV
  • [1] 冯青春,赵春江,李涛,等. 苹果四臂采摘机器人系统设计与试验[J]. 农业工程学报,2023,39(13):25-33.

    FENG Qingchun, ZHAO Chunjiang, LI Tao, et al. Design and test of a four-arm apple harvesting robot[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(13): 25-33. (in Chinese with English abstract)

    [2] 苗玉彬,郑家丰. 苹果采摘机器人末端执行器恒力柔顺机构研制[J]. 农业工程学报,2019,35(10):19-25.

    MIAO Yubin, ZHENG Jiafeng. Development of compliant constant-force mechanism for end effector of apple picking robot[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(10): 19-25. (in Chinese with English abstract)

    [3] 李涛,邱权,赵春江,等. 矮化密植果园多臂采摘机器人任务规划[J]. 农业工程学报,2021,37(2):1-10.

    LI Tao, QIU Quan, ZHAO Chunjiang, et al. Task planning of multi-arm harvesting robots for high-density dwarf orchards[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(2): 1-10. (in Chinese with English abstract)

    [4] 陈青,殷程凯,郭自良,等. 苹果采摘机器人关键技术研究现状与发展趋势[J]. 农业工程学报,2023,39(4):1-15.

    CHEN Qing, YIN Chengkai, GUO Ziliang, et al. Current status and future development of the key technologies for apple picking robots[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(4): 1-15. (in Chinese with English abstract)

    [5]

    TANG Y, CHEN M, WANG C, et al. Recognition and localization methods for vision-based fruit picking robots: A review[J]. Frontiers in Plant Science, 2020, 11: 520170.

    [6]

    XIAO Y, TIAN Z, YU J, et al. A review of object detection based on deep learning[J]. Multimedia Tools and Applications, 2020, 79: 23729-23791. doi: 10.1007/s11042-020-08976-6

    [7]

    WU G, LI B, ZHU Q, et al. Using color and 3D geometry features to segment fruit point cloud and improve fruit recognition accuracy[J]. Computers and Electronics in Agriculture, 2020, 174: 105475. doi: 10.1016/j.compag.2020.105475

    [8] 柳长源,赖楠旭,毕晓君,等. 基于深度图像的球形果实识别定位算法[J]. 农业机械学报,2022,53(10):228-235.

    LIU Changyuan, LAI Nanxu, BI Xiaojun, et al. Spherical fruit recognition and location algorithm based on depth image[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53(10): 228-235. (in Chinese with English abstract)

    [9]

    YU L, XIONG J, FANG X, et al. A litchi fruit recognition method in a natural environment using RGB-D images[J]. Biosystems Engineering, 2021, 204: 50-63. doi: 10.1016/j.biosystemseng.2021.01.015

    [10]

    XIAO B, NGUYEN M, YAN W, et al. Apple ripenessidentification from digital images using transformers[J]. Multimedia Tools and Applications, 2024, 83(3): 7811-7825. doi: 10.1007/s11042-023-15938-1

    [11]

    ZHANG L, HAO Q, CAO J, et al. Attention-based fine-grained lightweight architecture for fuji apple maturity classification in an open-world orchard environment[J]. Agriculture, 2023, 13(2): 228. doi: 10.3390/agriculture13020228

    [12]

    HU J N, GUO L, MO H L, et al. Crop node detection and internode length estimation using an improved YOLOv5 model[J]. Agriculture, 2023, 13(2): 473. doi: 10.3390/agriculture13020473

    [13] 张羽丰,杨景,邓寒冰,等. 基于RGB和深度双模态的温室番茄图像语义分割模型[J]. 农业工程学报,2024,40(2):295-306.

    ZHANG Yufeng, YANG Jing, DENG Hanbing, et al. Semantic segmentation model for greenhouse tomato images using RGB and depth bimodal[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(2): 295-306. (in Chinese with English abstract)

    [14] 李兴旭,陈雯柏,王一群,等. 基于级联视觉检测的樱桃番茄自动采收系统设计与试验[J]. 农业工程学报,2023,39(1):136-145.

    LI Xingxu, CHEN Wenbai, WANG Yiqun, et al. Design and experiment of an automatic cherry tomato harvesting system based on cascade vision detection[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(1): 136-145. (in Chinese with English abstract)

    [15]

    ZHAO R, ZHU Y, LI Y, et al. An end-to-end lightweight model for grape and picking point simultaneous detection[J]. Biosystems Engineering, 2022, 223: 174-188. doi: 10.1016/j.biosystemseng.2022.08.013

    [16] 张宏鸣,张国良,朱珊娜,等. 基于U-Net的葡萄种植区遥感识别方法[J]. 农业机械学报,2022,53(4):173-182.

    ZHANG Hongming, ZHANG Guoliang, ZHU Shanna, et al. Remote sensing recognition method of grape planting regions based on U-Net[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53(4): 173-182. (in Chinese with English abstract)

    [17] 周桂红,马帅,梁芳芳,等. 基于改进YOLOv4模型的全景图像苹果识别[J]. 农业工程学报,2022,38(21):159-168.

    ZHOU Guihong, MA Shuai, LIANG Fangfang, et al. Recognition of the apple in panoramic images based on improved YOLOv4 model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(21): 159-168. (in Chinese with English abstract)

    [18]

    CHEN C, LI B, LIU J X, et al. Monocular positioning of sweet peppers: An instance segmentation approach for harvest robots[J]. Biosystems Engineering, 2020, 196: 15-28. doi: 10.1016/j.biosystemseng.2020.05.005

    [19] 龙洁花,赵春江,林森,等. 改进Mask R-CNN的温室环境下不同成熟度番茄果实分割方法[J]. 农业工程学报,2021,37(18):100-108.

    LONG Jiehua, ZHAO Chunjiang, LIN Sen, et al. Segmentation method of the tomato fruits with different maturities under greenhouse environment based on improved Mask R-CNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(18): 100-108. (in Chinese with English abstract)

    [20]

    CHEN H, SUN K, TIAN Z, et al. BlendMask: Top-down meets bottom-up for instance segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, 2020: 8570-8578.

    [21]

    SUN K, XIAO B, LIU D , et al. Deep High-resolution representation learning for human pose estimation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019: 5686-5696.

    [22]

    WANG J, SUN K, CHENG T, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349-3364. doi: 10.1109/TPAMI.2020.2983686

    [23]

    WOO S, PARK J C, LEE J, et al. CBAM: convolutional block attention module[C]// Proceedings of the European Conference on Computer Vision. Berlin, Germany, 2018: 3-19.

    [24]

    WERLING M, ZIEGLER J, KAMMEL S, et al . Optimal trajectory generation for dynamic street scenarios in a Frenét Frame[C]//IEEE International Conference on Robotics and Automation. Anchorage, AK, USA, 2010: 987-993.

    [25]

    LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// Proceedings of the European Conference on Computer Vision. Zurich, Switzerland, 2014: 740-755.

    [26]

    TORRALBA A, RUSSELL B C, YUEN J, et al. Labelme: Online image annotation and applications[J]. Proceedings of the IEEE, 2010, 98(8): 1467-1484. doi: 10.1109/JPROC.2010.2050290

    [27]

    WANG Z, DU L, MAO J, et al. SAR target detection based on SSD with data augmentation and transfer learning[J]. IEEE Geoscience and Remote Sensing Letters, 2018, 16(1): 150-154.

    [28]

    DIEDERIK P, JIMMY BA. Adam: A method for stochastic optimization[C]//Proceedings of the International Conference for Learning Representations. San Diego, USA, 2015.

    [29]

    SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization[C]. International Journal of Computer Vision, 2020: 336-59.

    [30]

    WANG X, ZHANG R, KONG T, et al. Solov2: Dynamic and fast instance segmentation[J]. Advances in Neural Information Processing Systems. 2020, 33: 17721-17732.

    [31]

    CHENG T, WANG X, CHEN S, et al. Sparse instance activation for real-time instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA, 2022: 4433-4442.

    [32]

    HE J, LI P, GENG Y, et al. FastInst: A simple query-based model for real-time instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada, 2023: 23663-23672.

    [33]

    WEN Q, YANG J, YANG X, et al. PatchDCT: Patch refinement for high quality instance segmentation[C]//Proceedings of the International Conference for Learning Representations. Kigali, Rwanda, 2023.

图(12)  /  表(5)
计量
  • 文章访问数:  156
  • HTML全文浏览量:  24
  • PDF下载量:  77
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-11-27
  • 修回日期:  2024-03-03
  • 网络出版日期:  2024-05-27
  • 刊出日期:  2024-04-14

目录

    /

    返回文章
    返回