Lactating sow postures recognition from depth image of videos based on improved Faster R-CNN
-
-
Abstract
Abstract: The maternal behaviors reflect the health and welfare of the sows, which directly affect the economic benefit of the pig farm. Computer vision provides an effective, low-cost and non-contact method for monitoring the behavior of animal for precision farming. Under the scene of piggery, it is a challenge for 24-hour automatic recognition of lactating sow postures due to the daily illumination variations, influence of heat lamp, and adhesion between piglets and sows. This paper proposed an automatic recognition algorithm of lactating sow postures based on improved Faster R-CNN (convolutional neural network) using depth video images. To improve the recognition accuracy and satisfy the real-time need, we designed a ZF-D2R (ZF with deeper layers and 2 residual learning frameworks) network by introducing residual learning frameworks into ZF network. First, 3 convolutional layers were added in the ZF network to design ZF-D (ZF with deeper layers). Then, in ZF-D network, shortcut connections were used to form 2 residual learning frameworks. The whole network made up the ZF-D2R network. Moreover, the Center Loss was introduced to Fast R-CNN detector to construct a joint classification loss function. With the joint supervision signals of F-SoftmaxLoss and Center Loss in Fast R-CNN detector, a robust model was trained to obtain the deep feature representations with the 2 key learning objectives, which led to intra-class compactness and inter-class dispersion as much as possible. So, the joint supervision of F-SoftmaxLoss and Center Loss could reduce recognition errors caused by the similar features between different postures. By taking ZF-D2R as basic net and adding the Center Loss to Fast R-CNN detector, the improved Faster R-CNN was built. Experiments to obtain the actual data set of lactating sow posture from the depth video of sows in the 28 pens were performed. The data set included 2 451 standing images, 2 461 sitting images, 2 488 sternal recumbency images, 2 519 ventral recumbency images and 2658 lateral recumbency images. And 5 000 images were randomly chosen as the testing set. The rest of the images were used as training set. To enhance the diversity of training data, dataset augmentation including rotating and mirroring was employed. Based on the Caffe deep learning framework, our improved Faster R-CNN was trained with end-to-end approximate joint methods. By adding 2 residual learning frameworks to ZF-D, the ZF-D2R model improved the MAP (mean of average precision) by 1.28 percentage points. After introducing the Center Loss supervision signal, the MAP of the optimal model reached 93.25%, obtaining an increase of 1.3 percentage points, and the MAP of the method proposed achieved 93.25%. And APs (average precisions) of the 5 classes of postures i.e. standing, sitting, sternal recumbency, ventral recumbency and lateral recumbency were 96.73%, 94.62%, 86.28%, 89.57% and 99.04%, respectively. The MAP of our approach was 3.86 and 1.24 percentage points higher than that of Faster R-CNN based on ZF basic net and Faster RCNN based on the deeper VGG16 basic net, respectively. Our method processed images at a speed of 0.058 s per frame, 0.034 s faster than Faster R-CNN based on VGG16. Our proposed method could improve the recognition accuracy and simultaneously ensure the real-time performance. Compared with DPM (deformable part model) detector plus CNN posture classifier, the MAP of the end-to-end recognition method proposed in this paper was increased by 37.87 percentage points, and the speed was raised by 0.855 s per frame. Our method can be used for the 24-hour recognition of sow behaviors and lays the foundation for the analysis of sow dynamic behavior by video.
-
-