基于旋转框定位的改进YOLOv7的香蕉目标检测与定位方法

    Detecting and locating banana targets using improved YOLOv7 and rotational bounding box positioning

    • 摘要: 针对目标检测算法无法较好适配倾斜目标,且算法参数量大难以部署到嵌入式设备等问题,提出了一种带旋转定位框的改进YOLOv7的香蕉目标检测和定位方法,在引入GSConv模块降低计算复杂度和参数数量的同时能提高检测精度。香蕉目标以旋转框定位,使用五参数表示法定义旋转框,采用Kullback-Leibler divergence损失函数将旋转框映射到一个二维高斯分布,计算两个概率分布之间的差异,并将差异性作为损失进行优化,能更准确地衡量预测框与真实框之间的差异。此外,还采用Criminisi算法修复由深度相机局部空洞引起的定位错误。试验结果表明,改进后的旋转检测模型在香蕉目标检测速度和准确性上均有提升,平均精度达到96.15%,相比YOLOv7模型提高了17.04个百分点,检测帧率提高约40帧/s。此外,改进模型通过旋转边界框能更准确地预测香蕉果柄位置,定位误差均值降低到7.02 mm,平均相对误差降低到0.65%,相比YOLOv7模型分别减少了24.3 mm和1.96%。因此,该方法为复杂果园环境下快速、准确地识别和定位香蕉串及其果柄提供了有效解决方案。

       

      Abstract: Current target detection cannot adapt to the tilted targets so far. Particularly, the objects are not aligned vertically or horizontally, such as the bananas in orchards that typically grow at varying angles. Furthermore, the large number of parameters are also confined to deploy on the embedded devices with limited computing resources. Moreover, the high computational requirements of advanced models have hindered their deployment on embedded systems, which are commonly used in agricultural automation. In this study, an improved YOLOv7 algorithms was proposed for the banana target detection and localization with a rotational positioning frame. GSConv module was also incorporated to reduce the computational complexity and the number of parameters in the model for the high detection accuracy. Specifically, the GSConv module was the lightweight convolutional structure to maintain the model efficiency, thus more feasible for the real-time applications on embedded platforms with constrained hardware. The banana target was firstly positioned using a rotational bounding box, and then defined by a five-parameter representation. The tilted objects were better handled, as the rotational bounding box was used to accurately represent the various angles at which bananas grow. Furthermore, the Kullback-Leibler divergence loss function was employed to map the rotational frame into a two-dimensional Gaussian distribution. As such, the difference between two probability distributions (the predicted box and the ground truth box) was calculated to optimize as the loss function. The KL divergence-based approach also provided a more precise measure on the difference between the predicted and the actual bounding box, thus improving the localization accuracy. Another, Criminisi algorithm was integrated to solve the problems on the local holes of depth camera, where depth information was missing, due to the influencing factors, such as light interference or occlusions. Among them, the image inpainting was implemented to fill the missing information, and then correct the positioning errors in the depth data, further enhancing the accuracy of three-dimensional localization in the real-world orchard environments. Experimental results show the significant improvements were achieved in the detection speed and accuracy of banana targets using the improved model. Specifically, the average detection accuracy reached 96.15%, which was an impressive 17.04% increase over the standard YOLOv7 model. Additionally, the detection frame rate of the improved model was boosted by approximately 40 frames per second, highly suitable for real-time applications in agricultural settings. Moreover, the position of banana stems was predicted to notably enhance using the rotational boundary frame. The mean positioning error was reduced to 7.02 mm, and the mean error ratio was now 0.65%, which were reduced to 24.3 mm and 1.96%, respectively, compared with the original YOLOv7. In conclusion, the improved model can offer an effective solution to the fast and accurate identification and localization of banana bunches and fruit stalks in complex orchard environments. The higher detection accuracy was also achieved to significantly reduce the computational requirements, particularly for the real-time agricultural applications on embedded devices.

       

    /

    返回文章
    返回