LYU Jia, ZHANG Cuiping, LIU Qin, et al. Method for estimation of bagged grape yield using a self-correcting NMS-ByteTrack[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(13): 182-190. DOI: 10.11975/j.issn.1002-6819.202304116
    Citation: LYU Jia, ZHANG Cuiping, LIU Qin, et al. Method for estimation of bagged grape yield using a self-correcting NMS-ByteTrack[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(13): 182-190. DOI: 10.11975/j.issn.1002-6819.202304116

    Method for estimation of bagged grape yield using a self-correcting NMS-ByteTrack

    • Overlapping occlusion has seriously limited the yield estimation of bagged grapes in recent years, due to the ever-increasing grape volume after bagging and the large surface area of grape leaves. The unstable speed of manual video shooting can also lead to the loss of bagging grape targets. In this study, a yield estimation was proposed for the bagged grape using self-correcting Non-Maximum Suppression (NMS)-ByteTrack. Firstly, the bagged grapes were detected in the video using object detection (YOLOv5s). The NMS operation was also post-positioned to the tracking stage to retain the fruit detection boxes that filtered under the occlusion. Specifically, the detection boxes of bagged grapes were detected by object detection (YOLOv5s), whereas, the prediction boxes of bagged grapes were predicted to calculate the intersection over union using the Kalman filter. If the intersection over the union was less than the given threshold, the detection box was filtered to reserve the detection boxes closer to the predicted position. Then, the final detection box was obtained using the NMS operation. Secondly, the camera motion compensation and improved Kalman filter algorithm were added to automatically correct the position of the fruit prediction boxes and track them using ByteTrack. Specifically, the camera motion compensation was used to first extract the background key points in the bagged grape pictures of the previous frame and the current frame except for the tracking target. The sparse optical flow was used to match the extracted background in the key points of the bagged grapes. Then the affine transformation matrix of background motion was calculated by the RANSAC algorithm, and the Kalman filter was utilized to predict the bagged grape prediction boxes of the current frame. Finally, the affine transformation matrix was obtained to convert the prediction boxes in the coordinate system from the previous to the current frame, in order to realize the position information correction for the prediction boxes of the current frame. In addition, the improved Kalman filter algorithm can be used to change the aspect ratio in the state vector, and then to express the position of the bagged grape tracking boxes in the Kalman filter directly as the width, in order to more accurately estimate the position of the bagged grape tracking boxes. As such, a line-counting strategy was proposed to automatically count the bagged grapes. The strategy was also to set the counting line in the middle of the video, and also retain the idle frames without the bagged grapes in the first few seconds of the video. Furthermore, the bagged grapes with the same ID were only counted once to avoid repeated collision line counting. The successful collision between the center point of the tracking box of bagged grapes and the counting line was achieved in the automatic counting of bagged grapes. The dataset was collected from the Paidengte Agricultural Science and Technology Demonstration Park, Bishan District, Chongqing of China. The mobile phone cameras of Redmi K40 and OPPO Reno6pro+ were selected to capture the pictures of the same bagged grape at 8:00, 12:00, and 18:00 from different angles, such as frontal, sideways, and overhead shots. The total shooting time was about 6 h, the shooting height was about 1.5 m from the ground, and the shooting route was line by line. A total of 500 images and six videos were obtained for the bagged grapes. Among them, 500 images were expanded to 2000 images after saturation, brightness enhancement, brightness reduction, and mirror operations. Then, 2000 images were randomly divided into the training and validation sets in the ratio of 8:2, where six valid videos were used as the test set. Relevant experiments were conducted using this dataset. The experimental results showed that the yield estimation achieved a significant improvement in the tracking performance of bagged grapes using self-correcting NMS-ByteTrack. The multi-object tracking accuracy and precision, as well as the identification F1-score, were 64.6%, 82.4% and 80.8%, respectively, which increased by 1.7, 1.0 and 4.1 percentage points, respectively, compared with the ByteTrack. The number of ID switch was reduced by 3 times. Then, the average counting accuracy reached 82.8% in terms of counting performance, compared with manual counting. In addition, a comparison was also made with the five tracking methods. A better tracking performance was achieved, compared with the rest. The applicability of this estimation was also verified in tracking and counting bagged grapes. Therefore, the yield estimation can be expected to effectively promote the tracking and counting of bagged grapes in real scenarios using self-correcting NMS-ByteTrack. More accurate yield estimation of bagged grapes was also achieved in this field.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return