基于JDE模型的群养生猪多目标跟踪

    Multiple object tracking of group-housed pigs based on JDE model

    • 摘要: 为实现群养生猪在不同场景下(白天与黑夜,猪只稀疏与稠密)的猪只个体准确检测与实时跟踪,该研究提出一种联合检测与跟踪(Joint Detection and Embedding,JDE)模型。首先利用特征提取模块对输入视频序列提取不同尺度的图像特征,产生3个预测头,预测头通过多任务协同学习输出3个分支,分别为分类信息、边界框回归信息和外观信息。3种信息在数据关联模块进行处理,其中分类信息和边界框回归信息输出检测框的位置,结合外观信息,通过包含卡尔曼滤波和匈牙利算法的数据关联算法输出视频序列。试验结果表明,本文JDE模型在公开数据集和自建数据集的总体检测平均精度均值(mean Average Precision,mAP)为92.9%,多目标跟踪精度(Multiple Object Tracking Accuracy,MOTA)为83.9%,IDF1得分为79.6%,每秒传输帧数(Frames Per Second,FPS)为73.9帧/s。在公开数据集中,对比目标检测和跟踪模块分离(Separate Detection and Embedding,SDE)模型,本文JDE模型在MOTA提升0.5个百分点的基础上,FPS提升340%,解决了采用SDE模型多目标跟踪实时性不足问题。对比TransTrack模型,本文JDE模型的MOTA和IDF1分别提升10.4个百分点和6.6个百分点,FPS提升324%。实现养殖环境下的群养生猪多目标实时跟踪,可为大规模生猪养殖的精准管理提供技术支持。

       

      Abstract: Pig production has been always the pillar of the industrial livestock industry in China. Therefore, the pig industry is closely related to food safety, social stability, and the coordinated development of the national economy. An intelligent video surveillance can greatly contribute to the large-scale production of animal husbandry under labor shortage at present. It is very necessary to accurately track and identify the abnormal behavior of group-housed pigs in the breeding scene. Much effort has been focused on Multiple Object Tracking (MOT) for pig detection and tracking. Among them, two parts are included in the Tracking By Detection (TBD) paradigm, e.g., the Separate Detection and Embedding (SDE) model. Previously, the detector has been developed to detect pig objects. And then the tracking models have been selected for the pig tracking using Kalman filter and Hungarian (Sort or DeepSORT). The detection and association steps have been designed to increase the running and training time of the model in the dominant MOT strategy. Thus, real-time tracking cannot fully meet the requirement of the group-housed pigs. In this study, a Joint Detection and Embedding (JDE) model was proposed to automatically detect the pig objects and then track each one in the complex scenes (day or night, sparse or dense). The core of JDE model was to integrate the detector and the embedding model into a single network for the real-time MOT system. Specifically, the JDE model incorporated the appearance model into a single-shot detector. As such, the simultaneous output was performed on the corresponding appearance to improve the runtime and operational efficiency of the model. An overall loss of one multiple task learning loss was utilized in the JDE model. Three loss functions were included classification, box regression and appearance. Three merits were achieved after operations. Firstly, the multiple tasks learning loss was used to realize the object detection and appearance to be learned in a shared model, in order to reduce the amount of occupied memory. Secondly, the forward operation was computed using the multiple tasks loss at one time. The overall inference time was reduced to improve the efficiency of the MOT system. Thirdly, the performance of each prediction head was promoted to share the same set of low-level features and feature pyramid network architecture. Finally, the data association module was utilized to process the outputs of the detection and appearance head from the JDE, in order to produce the position prediction and ID tracking of multiple objects. The JDE model was validated on the special dataset under a variety of settings. The special dataset was also built with a total of 21 video segments and 4 300 images using the dark label video annotation software. Two types of datasets were obtained, where the public dataset contained 11 video sequences and 3 300 images, and the private dataset contained 10 video segments and 1 000 images. The experimental results show that the mean Average Precision (mAP), Multiple Object Tracking Accuracies (MOTA), IDF1 score, and FPS of the JDE on all test videos were 92.9%, 83.9%, 79.6%, and 73.9 frames/s, respectively. A comparison was also made with the SDE model and TransTrack method on the public dataset. The JDE model improved the FPS by 340%, and the MOTA by 0.5 percentage points in the same test dataset, compared with the SDE model. It infers the sufficient real-time performance of MOT using the JDE model. The MOTA, IDF1 metrics, and FPS of the JDE model was improved by 10.4 and 6.6 percentage points, and 324%, respectively, compared with the TransTrack model. The visual tracking demonstrated that the JDE model performed the best detection and tracking ability with the SDE and TransTrack models under the four scenarios, including the dense day, sparse day, dense night, and sparse night. The finding can also provide an effective and accurate detection for the rapid tracking of group-housed pigs in complex farming scenes.

       

    /

    返回文章
    返回