Counting method for tomatoes of different maturity based on ultra-depth masking and improved YOLOv8
-
-
Abstract
Given the challenges associated with continuously and stably tracking fruit objects, as well as the difficulties in achieving statistical counting accuracy under greenhouse production conditions and due to crop planting mode constraints, a counting method for tomatoes of varying maturity based on ultra-depth masking and an improved YOLOv8 model was proposed. By introducing multi-head self-attention (MHSA), a spatially heterogeneous convolution kernel that integrates global features was constructed, utilizing the lightweight convolution operator Involution. Based on this, a new convolution operator, termed Global Attention-based Involution (GAInvolution), was optimized and designed. This operator formed the backbone network of our tomato detector, YOLOT, an improved YOLOv8 model that also incorporated the WIoU (Wise Intersection over Union) loss to enhance the robustness of object labeling. In addition, the depth information predicted by the mono depth estimation model called Depth Anything was used to dynamically filter distant fruit objects, thereby solving the problem of object tracking loss or duplicate tracking. We referred to this processing as ultra-depth masking. We also optimized and designed a tomato counter based on the BoT-SORT algorithm. By combining the object detector, depth estimation model, ultra-depth masking processing, and object counter, a comprehensive framework for computing tomatoes of different maturity levels was constructed. The experimental results showed that compared with YOLOv8, the mean average precision (mAP) of the improved tomato detector had increased by 3.2 percentage points, and the recall rate had increased by 3.7 percentage points, highlighting the effectiveness of our modifications to the YOLOv8 model. The GAInvolution convolution operator significantly improved the tomato object detector's capability. Compared to the YOLOv8n model, the detector with GAInvolution as its main operator achieved a 2.7 percentage point increase in mAP and a 2.8 percentage point increase in the recall rate, effectively enhancing its object detection capability. The introduction of WIoU loss function further improved the detection accuracy of the model, and mAP increased by 1.1 percentage points compared to the original model. Ultra-depth masking processing played a crucial role in enhancing the accuracy of tomato fruit statistical counting. This study dynamically calculated the depth threshold by using the depth map predicted based on the Depth Anything model. When using the average depth value of the depth map minus 0.5 times the depth standard deviation as the threshold, we achieved the highest counting accuracy. Compared with the method without ultra-depth masking, the average counting precision (ACP) of tomato fruits using ultra-depth masking had increased by 12.63 percentage points. Our method achieved a tomato fruit ACP of 93.80% and a calculation speed of 23 frames per second, demonstrating the effectiveness of our method for tomato fruits counting tasks. Enhancing the accuracy of the object detector directly improved the accuracy of fruits counting. Our method can count tomato fruits of different maturity levels, with an ACP of not less than 91%. Additionally, different inspection viewports can also affect counting accuracy. The accuracy achieved with the vertical viewport was 2.59 percentage points higher than the accuracy achieved with the parallel viewport, which was more conducive to fruit counting. This study can provide valuable technical reference for fruit and vegetable yield statistics based on visual technology.
-
-