基于姿态与时序特征的猪只行为识别方法

董力中; 孟祥宝; 潘明; 朱熠; 梁宇斌; 高翔; 刘红刚

doi:10.11975/j.issn.1002-6819.2022.05.018

摘要: 生猪行为监测是生猪养殖管理过程中的一个重要环节。该研究提出了基于姿态与时序特征的猪只行为识别方法。首先采集和标注猪栏内猪只图像，分别构建了猪只目标检测数据集、猪只关键点数据集和猪只行为识别数据集；利用构建的数据集，分别训练了基于YOLOv5s的猪只检测模型、基于轻量化OpenPose算法的猪只姿态估计模型和基于ST-GCN算法的猪只行为识别模型，并搭建了猪只行为识别系统。经测试，文中训练的YOLOv5s猪只检测模型mAP（mean Average Precision）最高达到0.995，姿态估计模型平均精度和平均召回率达到93%以上，基于ST-GCN的猪只行为识别模型的平均准确率为86.67%。文中构建的猪只行为识别系统中基于LibTorch推理猪只检测模型和猪只姿态估计模型的单帧推理耗时分别约为14和65 ms，单只猪行为识别推理耗时约为8 ms，每提取200帧连续姿态进行一次行为识别推理，平均17 s更新一次行为识别结果。证明提出的基于姿态与时序特征的猪只行为识别方法具有一定可行性，为群养猪场景下的猪只行为识别提供了思路。

Abstract: Monitoring pig behavior has been one of the most important steps in the process of pig breeding management. Computer vision technology has been used to quickly and conveniently detect the activities of pigs for the welfare in recent years. In this study, a monitoring system of pig behavior was proposed for automatic recognition using improved computer vision. The five steps wereas follows. 1) The monitoring video was collected in a pig house using a network camera; 2) The single bounding box regression and the pose of pigs were extracted by the YOLOv5s and lightweight OpenPose algorithms, respectively; 3) A Simple Online and Realtime Tracking (SORT) approach was used to match the same pig target in the multiple video frames; 4) The spatiotemporal map of pig skeleton was generated to extract 200 frames of continuous pose images; and 5) A spatial temporal and graph convolutional networks (ST-GCN) model was selected to process the spatiotemporal map of pig skeleton for the prediction of behavior categories. Therefore, the video data was first collected on the herd of pigs. A LabelImg software was then used to mark the object detection data set, whereas, the Labelme tool was to mark the keypoint data set. A total number of more than 700 pig behavior video clips were extracted for the data sets. The pose extraction script was used to automatically extract the pig pose information from the behavior video clips, where the behavior data set was generated in kinetics-skeleton format. The object detection data set was selected to train the pig object detection model using the YOLOv5s algorithm. The keypoint data set was used to train the pig pose estimation model using the Lightweight OpenPose algorithm. The pig behavior data set in kinetics-skeleton format was used to train the pig behavior recognition model using the ST-GCN algorithm. All models were trained using the PyTorch deep learning framework on an NVIDIA Tasla T4 GPU. Some indicators were selected to evaluate the models after training. The precision of the target detection model was more than 97%, while the average accuracy and recall rate of the pig pose estimation model was more than 93%. The recognition accuracy for the standing, walking, and lying behavior were 93.3%, 73.3%, and 93.3% in the pig behavior recognition model, respectively, where the average accuracy was 86.67%. The results show that the ST-GCN approach was feasible for pig behavior recognition. A graphical interface system was then developed for the pig behavior recognition by the QT framework. Furthermore, the LibTorch library was used to inference the models in the behavior recognition system. The system was running on the NVIDIA GTX 1080Ti GPU, with 14 ms taken by the YOLOv5s model inference, 0.05 ms by the SORT tracking, and 65 ms by the pose estimation model inference. The average processing frame rate was about 12 FPS. Every 200 frames of continuous pig posture were extracted by the program, and the behavior recognition reasoning was carried out once. The behavior recognition inference of a single pig was taken at about 8ms, and the average time of updating behavior recognition was about 17 s. The finding can provide a new idea for the behavior recognition of pigs.

基于姿态与时序特征的猪只行为识别方法

Recognizing pig behavior on posture and temporal features using computer vision