基于自纠正NMS-ByteTrack的套袋葡萄估产方法

    Method for estimation of bagged grape yield using a self-correcting NMS-ByteTrack

    • 摘要: 针对套袋后的葡萄体积增加和葡萄叶片表面积大容易出现重叠遮挡,及人工拍摄视频的速度不稳定可能导致套袋葡萄目标丢失的问题,该研究提出一种基于自纠正NMS(non-maximum suppression)-ByteTrack的套袋葡萄估产方法。该方法首先通过目标检测方法YOLOv5s检测视频中的套袋葡萄,将检测阶段的NMS操作后置到追踪阶段,保留因遮挡而被过滤的果实检测框;其次在ByteTrack的基础上加入相机运动补偿和改进的卡尔曼滤波算法,以自动纠正果实预测框的位置并进行追踪;最后提出一种划线计数策略对套袋葡萄自动计数。试验结果表明,该方法的多目标追踪准确率、多目标追踪精度和ID调和平均数分别为64.6%、82.4%和80.8%,相比ByteTrack分别提高了1.7、1.0和4.1个百分点,平均计数精度达到82.8%。因此,基于自纠正NMS-ByteTrack的估产方法能有效解决套袋葡萄的追踪计数问题,实现对套袋葡萄更精确地估产。

       

      Abstract: Overlapping occlusion has seriously limited the yield estimation of bagged grapes in recent years, due to the ever-increasing grape volume after bagging and the large surface area of grape leaves. The unstable speed of manual video shooting can also lead to the loss of bagging grape targets. In this study, a yield estimation was proposed for the bagged grape using self-correcting Non-Maximum Suppression (NMS)-ByteTrack. Firstly, the bagged grapes were detected in the video using object detection (YOLOv5s). The NMS operation was also post-positioned to the tracking stage to retain the fruit detection boxes that filtered under the occlusion. Specifically, the detection boxes of bagged grapes were detected by object detection (YOLOv5s), whereas, the prediction boxes of bagged grapes were predicted to calculate the intersection over union using the Kalman filter. If the intersection over the union was less than the given threshold, the detection box was filtered to reserve the detection boxes closer to the predicted position. Then, the final detection box was obtained using the NMS operation. Secondly, the camera motion compensation and improved Kalman filter algorithm were added to automatically correct the position of the fruit prediction boxes and track them using ByteTrack. Specifically, the camera motion compensation was used to first extract the background key points in the bagged grape pictures of the previous frame and the current frame except for the tracking target. The sparse optical flow was used to match the extracted background in the key points of the bagged grapes. Then the affine transformation matrix of background motion was calculated by the RANSAC algorithm, and the Kalman filter was utilized to predict the bagged grape prediction boxes of the current frame. Finally, the affine transformation matrix was obtained to convert the prediction boxes in the coordinate system from the previous to the current frame, in order to realize the position information correction for the prediction boxes of the current frame. In addition, the improved Kalman filter algorithm can be used to change the aspect ratio in the state vector, and then to express the position of the bagged grape tracking boxes in the Kalman filter directly as the width, in order to more accurately estimate the position of the bagged grape tracking boxes. As such, a line-counting strategy was proposed to automatically count the bagged grapes. The strategy was also to set the counting line in the middle of the video, and also retain the idle frames without the bagged grapes in the first few seconds of the video. Furthermore, the bagged grapes with the same ID were only counted once to avoid repeated collision line counting. The successful collision between the center point of the tracking box of bagged grapes and the counting line was achieved in the automatic counting of bagged grapes. The dataset was collected from the Paidengte Agricultural Science and Technology Demonstration Park, Bishan District, Chongqing of China. The mobile phone cameras of Redmi K40 and OPPO Reno6pro+ were selected to capture the pictures of the same bagged grape at 8:00, 12:00, and 18:00 from different angles, such as frontal, sideways, and overhead shots. The total shooting time was about 6 h, the shooting height was about 1.5 m from the ground, and the shooting route was line by line. A total of 500 images and six videos were obtained for the bagged grapes. Among them, 500 images were expanded to 2000 images after saturation, brightness enhancement, brightness reduction, and mirror operations. Then, 2000 images were randomly divided into the training and validation sets in the ratio of 8:2, where six valid videos were used as the test set. Relevant experiments were conducted using this dataset. The experimental results showed that the yield estimation achieved a significant improvement in the tracking performance of bagged grapes using self-correcting NMS-ByteTrack. The multi-object tracking accuracy and precision, as well as the identification F1-score, were 64.6%, 82.4% and 80.8%, respectively, which increased by 1.7, 1.0 and 4.1 percentage points, respectively, compared with the ByteTrack. The number of ID switch was reduced by 3 times. Then, the average counting accuracy reached 82.8% in terms of counting performance, compared with manual counting. In addition, a comparison was also made with the five tracking methods. A better tracking performance was achieved, compared with the rest. The applicability of this estimation was also verified in tracking and counting bagged grapes. Therefore, the yield estimation can be expected to effectively promote the tracking and counting of bagged grapes in real scenarios using self-correcting NMS-ByteTrack. More accurate yield estimation of bagged grapes was also achieved in this field.

       

    /

    返回文章
    返回