Xue Yueju, Li Shimei, Zheng Chan, Gan Haiming, Li Chengpeng, Liu Hongshan. Posture change recognition of lactating sow by using 2D-3D convolution feature fusion[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(9): 230-237. DOI: 10.11975/j.issn.1002-6819.2021.09.026
    Citation: Xue Yueju, Li Shimei, Zheng Chan, Gan Haiming, Li Chengpeng, Liu Hongshan. Posture change recognition of lactating sow by using 2D-3D convolution feature fusion[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(9): 230-237. DOI: 10.11975/j.issn.1002-6819.2021.09.026

    Posture change recognition of lactating sow by using 2D-3D convolution feature fusion

    • Posture change of lactating sow directly determines the preweaning survival rate of piglets. Automated recognition of sow posture change can make early warning possible to improve the survival rate of piglets. The frequency, type, and duration of sow posture changes can be expected to select the sows with high maternal quality as breeding pigs. But it is difficult to accurately recognize actions of sow posture change, due to the variety of posture changes, as well as the differences of range and duration of the movement. In this study, a convolutional network (2D+3D-CNet, 2D+3D convolutional Network) coupled with 2D-3D convolution feature fusion was proposed to recognize actions of sow posture change in-depth images. Experimental data was collected from a commercial pig farm in Foshan City, Guangdong Province of South China. A Kinect 2.0 camera was fixed directly above the pen to record daily activities of sows with a top view and a video frame of 5 fps. RGB-D video collection was conducted with a depth image resolution of 512×424 pixels. Median filtering and histogram equalization were used to process the dataset. The video clips were then fed into 2D+3D-CNet for training and testing. 2D+3D-CNet included spatiotemporal and spatial feature extraction, feature fusion, action recognition, and postures classification. This approach was adopted to fully integrate the video-level action recognition and frame-level posture classification. Firstly, 16-frame video clips were fed into the network, and then 3D ResNeXt-50 and Darknet-53 were used to extract the spatiotemporal and spatial features during sow movement. A SE module was added to the residual network structure of 3D ResNeXt-50, named 3D SE- ResNeXt-50, to boost the representation power of the network. The sow bounding box and the probability of posture changes were generated from the action recognition after feature fusion. The sow bounding box was then mapped to Darknet-53, where the 13th convolutional layer feature was processed for the sow regional feature maps. Next, the sow regional feature maps were fed into postures classification to finally obtain four probabilities of the posture. Considering the spatiotemporal motion and inter-frame postures variation during sow posture change, the action score was designed to indicate the possibility of posture change, and the threshold was set to determine the start and end time of a posture change action of a sow. Since the start and end time were determined, the specific posture change was classified via combining with the posture of sow one second before the start time, and one second after the end time. The method can be expected to directly recognize a specific posture change action of sow without a large number of datasets to be collected and annotated. The 2D+3D-CNet model was trained using PyTorch deep learning framework on an NVIDIA RTX 2080Ti GPU (graphics processing units), while the algorithm was developed on Ubuntu 16.04 platform. The performance of the algorithm was evaluated on the test set. The classification accuracies of lateral lying, standing, sitting, and ventral lying were 100%, 98.69%, 98.24%, and 98.19%, respectively. The total recognition accuracy of sow posture change actions was 97.95%, while the total recall rate was 91.67%, and the inference speed was 14.39 frames/s. The accuracies increased by 5.06 and 5.53 percentage points, and the recall rate increased by 3.65 and 5.90 percentage points, respectively, compared YOWO, and MOC-D Although 2D+ 3D-CNet’s model size was bigger than FRCNN-HMM, it had some advantages in the accuracy, recall and test speed. The presented method can remove hand-crafted features to achieve real-time inference and more accurate action localization.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return