基于超深掩蔽与改进YOLOv8的不同成熟度番茄计数方法

    Counting tomatoes with different maturities using ultra-depth masking and improved YOLOv8

    • 摘要: 针对在温室生产条件和作物种植模式约束下,番茄果实目标连续稳定跟踪困难,难以满足统计计数精度要求等问题,提出一种基于超深掩蔽与改进YOLOv8的不同成熟度果实计数方法。在YOLOv8基础上,构建融合全局特征的空间异质卷积核,优化设计卷积算子及目标检测网络,引入对果实目标标注更具鲁棒性的损失函数。提出用深度估计模型预测的深度信息,动态生成深度阈值,基于该阈值,掩蔽远景果实目标,解决目标跟踪不稳定产生的计数精度低的问题。结果表明,与YOLOv8n相比,改进模型平均检测精度提高了3.2个百分点,召回率提高了3.7个百分点;将所设计的卷积算子用于目标检测模型,与使用该算子前相比,果实检测精度提高了2.7个百分点,与引入鲁棒性损失函数前相比,引入该损失后,检测精度提高了1.1个百分点;与不用超深掩蔽处理相比,应用该处理后,番茄果实计数精度提高了12.63个百分点;该方法的番茄果实计数精度为93.80%,对不同成熟度果实的计数精度不低于91.00%,计算速度为23帧/s。对YOLOv8的改进是有效的,超深掩蔽对提高番茄计数精度具有重要作用,研究可为基于视觉技术的果蔬产量统计提供技术参考。

       

      Abstract: Continuously and stably tracking fruit objects is required for tomato production in modern agriculture in recent years. Some difficulties also remained on the statistical counting accuracy under greenhouse production, due to the mode constraints of crop planting. In this study, a counting method was proposed for tomatoes of varying maturity, utilizing ultra-depth masking and an improved YOLOv8 model. Multi-head self-attention (MHSA) was introduced to construct a spatially heterogeneous convolution kernel. The global features were also integrated using the lightweight convolution operator Involution. A new convolution operator was optimized and designed, termed Global Attention-based Involution (GAInvolution). This operator formed the backbone network of the tomato detector, YOLOT, an improved YOLOv8 model. The model also incorporated the WIoU (wise intersection over union) loss to improve the robustness of the object labeling process. In addition, the depth information was predicted by the mono-depth estimation model, called Depth Anything. The distant fruit objects were dynamically filtered to avoid object tracking loss or duplicate tracking. This processing was referred to as ultra-depth masking. A tomato counter was also optimized using the BoT-SORT algorithm. Combining the tomato detector, depth estimation model, ultra-depth masking, and object counter, the comprehensive framework was constructed to identify and count tomatoes of different maturity levels. The experimental results showed that the mean average precision at IoU thresholds of 0.5 (mAP50) of the improved tomato detector increased by 3.2 percentage points, and the recall rate increased by 3.7 percentage points, indicating the effectiveness of the improved YOLOv8 model, compared with the original. The GAInvolution convolution operator significantly improved the tomato object detector. The detector with GAInvolution as the main operator achieved the 2.7 and 2.8 percentage point increase in the mAP50 and the recall rate, respectively, thereby significantly improving tomato detection performance. WIoU loss function was introduced to further improve the detection accuracy, where the mAP50 increased by 1.1 percentage points, compared with the original. Ultra-depth masking greatly contributed to the accuracy of tomato fruit counting. The depth threshold was dynamically calculated and predicted using the depth map and the Depth Anything model. The highest counting accuracy was achieved by setting the average depth value of the depth map minus 0.5 times the depth standard deviation as the threshold. The average counting precision (ACP) of tomato fruits increased by 12.63 percentage points using ultra-depth masking. The better performance of counting tasks was achieved in the tomato fruits, with an ACP of 93.80% and a calculation speed of 23 frames per second. The accuracy of fruit counting was closely related to the performance of the object detector. Moreover, counting accuracy varied with different inspection viewports, with the vertical viewport achieving 2.59 percentage points higher accuracy than the parallel one. Tomatoes of different maturity levels were counted with ACP of at least 91%. This finding offers valuable technical insights for predicting fruit and vegetable yields using visual technology.

       

    /

    返回文章
    返回