Abstract:
Accurate and rapid estimation of spatial posture is crucial to monitoring the behavior in group-housed livestock. Among them, 3D posture estimation can offer precise spatial data in the conditions of occlusion, compared with the traditional 2D. The current techniques of 3D posture estimation are primarily applied in human and autonomous driving fields, thus depending mainly on expensive measurement equipment and datasets. However, it is still challenging for animal behavior and management at present. Therefore, it is urgently needed for the low-cost and efficient measurement of animal behavior posture. In this study, a general approach was proposed to estimate the 3D posture of livestock using binocular stereo matching. Firstly, a modified model of stereo matching was employed to obtain the depth information using deep learning. Then, a top-down 2D posture model was used to extract the target bounding boxes and then detect the key points. Finally, the locations of key points were mapped back to the image space and then fused with the stereo-matching model for the 3D posture information. Since the matching accuracy depended on the precise depth information, the main challenges in stereo matching were attributed to the thin structure and weak texture matching. ACLNet stereo matching model was constructed using attention mechanism and ConvGRU iterative refinement. The relative depth layers of image textures were encoded to restrict the attention of the model to the areas near the true disparity. The high-precision depth information was gradually recovered in a residual manner. Ablation experiments and generalization tests were carried out on the Scene Flow dataset and the Middlebury dataset, respectively, in order to validate the effectiveness of the ACLNet model. The results show that the ACLNet model was achieved in an endpoint error (EPE) of 0.45 on the Scene Flow dataset, with a reduction of 0.38, compared with the baseline model without attention and ConvGRU mechanisms. Better generalization was also performed on real-world datasets, such as Middlebury. The EPE was 0.56 on the goat depth dataset; The mean per joint position error (MPJPE) on the goat 3D posture test set reached 45.7 mm in the improved model. There was a decrease of 21.1 mm, compared with the baseline. Strong generalization and versatility were obtained to accurately estimate the livestock 3D posture without additional training. The 3D posture estimation experiments were also verified to take the goats as test subjects. Binocular images were only required to accurately obtain the 3D posture. The feasibility of high-precision 3D posture estimation on livestock was then validated using a simple binocular system. The finding can provide a viable solution to accurate 3D posture estimation using low-cost stereo cameras.