融合2D-3D卷积特征识别哺乳母猪姿态转换

薛月菊; 李诗梅; 郑婵; 甘海明; 李程鹏; 刘洪山

doi:10.11975/j.issn.1002-6819.2021.09.026

摘要: 母猪姿态转换影响仔猪存活率，且动作幅度与持续时间存在差异，准确识别难度大。该研究提出一种融合2D-3D卷积特征的卷积网络（2D+3D-CNet，2D+3D Convolutional Network）识别深度图像母猪姿态转换。以视频段为输入，引入注意力机制SE模块和3D空洞卷积，以提升3D卷积网络姿态转换的时空特征提取能力，用2D卷积提取母猪的空间特征；经特征融合后，动作识别分支输出母猪转换概率，姿态分类分支输出4类姿态概率，结合这两个输出结果识别8类姿态转换，减少了人工标注数据集的工作量；最后设计动作分数，优化母猪姿态转换的时间定位。在测试集上，2D+3D-CNet姿态转换识别精度为97.95%、召回率为91.67%、测试速度为14.39帧/s，精度、召回率和时间定位精度均高于YOWO、FRCNN-HMM和MOC-D方法。该研究结果实现了母猪姿态转换高精度识别。

Abstract: Posture change of lactating sow directly determines the preweaning survival rate of piglets. Automated recognition of sow posture change can make early warning possible to improve the survival rate of piglets. The frequency, type, and duration of sow posture changes can be expected to select the sows with high maternal quality as breeding pigs. But it is difficult to accurately recognize actions of sow posture change, due to the variety of posture changes, as well as the differences of range and duration of the movement. In this study, a convolutional network (2D+3D-CNet, 2D+3D convolutional Network) coupled with 2D-3D convolution feature fusion was proposed to recognize actions of sow posture change in-depth images. Experimental data was collected from a commercial pig farm in Foshan City, Guangdong Province of South China. A Kinect 2.0 camera was fixed directly above the pen to record daily activities of sows with a top view and a video frame of 5 fps. RGB-D video collection was conducted with a depth image resolution of 512×424 pixels. Median filtering and histogram equalization were used to process the dataset. The video clips were then fed into 2D+3D-CNet for training and testing. 2D+3D-CNet included spatiotemporal and spatial feature extraction, feature fusion, action recognition, and postures classification. This approach was adopted to fully integrate the video-level action recognition and frame-level posture classification. Firstly, 16-frame video clips were fed into the network, and then 3D ResNeXt-50 and Darknet-53 were used to extract the spatiotemporal and spatial features during sow movement. A SE module was added to the residual network structure of 3D ResNeXt-50, named 3D SE- ResNeXt-50, to boost the representation power of the network. The sow bounding box and the probability of posture changes were generated from the action recognition after feature fusion. The sow bounding box was then mapped to Darknet-53, where the 13th convolutional layer feature was processed for the sow regional feature maps. Next, the sow regional feature maps were fed into postures classification to finally obtain four probabilities of the posture. Considering the spatiotemporal motion and inter-frame postures variation during sow posture change, the action score was designed to indicate the possibility of posture change, and the threshold was set to determine the start and end time of a posture change action of a sow. Since the start and end time were determined, the specific posture change was classified via combining with the posture of sow one second before the start time, and one second after the end time. The method can be expected to directly recognize a specific posture change action of sow without a large number of datasets to be collected and annotated. The 2D+3D-CNet model was trained using PyTorch deep learning framework on an NVIDIA RTX 2080Ti GPU (graphics processing units), while the algorithm was developed on Ubuntu 16.04 platform. The performance of the algorithm was evaluated on the test set. The classification accuracies of lateral lying, standing, sitting, and ventral lying were 100%, 98.69%, 98.24%, and 98.19%, respectively. The total recognition accuracy of sow posture change actions was 97.95%, while the total recall rate was 91.67%, and the inference speed was 14.39 frames/s. The accuracies increased by 5.06 and 5.53 percentage points, and the recall rate increased by 3.65 and 5.90 percentage points, respectively, compared YOWO, and MOC-D Although 2D+ 3D-CNet’s model size was bigger than FRCNN-HMM, it had some advantages in the accuracy, recall and test speed. The presented method can remove hand-crafted features to achieve real-time inference and more accurate action localization.

融合2D-3D卷积特征识别哺乳母猪姿态转换

Posture change recognition of lactating sow by using 2D-3D convolution feature fusion