Abstract:
Abstract: The survival rate of piglets has a great influence on productivity and breeding efficiency in pig farms. Appropriate temperature can be a key factor to ensure the survival of piglets. There are different lying postures of pigs under various climatic conditions. Specifically, the pigs lie laterally on their side with the limbs extended at high temperatures. By contrast, a sternal or ventral lying posture is normally adopted at low temperatures. When keep lying down or sitting for a long time, the piglets may be in an abnormal state, such as illness. It is a high demand to rapidly and accurately recognize the piglet postures. However, there is still a great challenge on the images with the server occlusion and adhesion, due to the small piglets, and the low contrast appearance with the pigsty background. Particularly, the piglets are also likely to gather together. In this study, a new Transformer + Anchor-Free (TransFree) model was proposed for piglet detection and postures recognition. Swin Transformer was used as the backbone to extract the local and global features of piglet images. The feature enhancement consisted of a feature pyramid and upsampling enhancement module. The multi-scale feature fusion was then performed to obtain a high-resolution feature map. Finally, the fused feature maps were input into the Anchor-Free detection head for the piglet localization and postures classification. The data collection was located at a commercial pig farm in Foshan City, Guangdong Province, China. A total of four times were collected from May 2016 to September 2018, and the collection time of each pig pen was 0.5 to 12 h. The size of the pen was about 3.8 m × 2.0 m, and the piglets were 6 to 30 days old. The camera was erected on the top of the pig pen to capture the video vertically downward. The camera height was varied from 1.8 to 2.2 m, in order to ensure that the entire pig pen was covered as much as possible. A total of 12 columns of shooting videos were used to make a data set. Among them, nine columns (1 877 video images) were selected as the training set, while, three columns (460 video images) were used as the test set. The frame of the image was also taken every 15s. Subsequently, the piglet target and posture categories were labelled using the labeling tool. The final training set contained 6 935 prone, 7 134 lateral, and 5 860 standing postures. The test set contained 1 763 prone, 1 653 lateral, and 1 734 standing postures. The random data augmentation was then performed on the training set, such as the vertical and horizontal flipping, Gaussian blur, motion blur, and brightness adjustment. The experiment was also carried out on the Ubuntu18.04 system with a CPU of Intel Core i7-10700 and a GPU (graphics processing units) of NVIDIA GeForce RTX3090 whose memory was 24 GB. The test results demonstrated that the best performance of the TransFree model was achieved in the piglet pose recognition, with an accuracy of 95.68%, a recall of 91.18%, and an F1-score of 93.38%. A comparison was made to verify the performance of the TransFree model, particularly with the Anchor-based target detection (Faster R-CNN), the Anchor-free target detection (CenterNet), and the latest Anchor-Free target detection (YOLOX-L, YOLOX's large variant) model. Specifically, the detection accuracy and the F1-score of the TransFree model were improved by 6.75, and 4.07 percentage points, respectively, compared with the Faster R-CNN. The detection accuracy and F1-score increased by 4.07, and 2.32 percentage points, respectively, compared with the CenterNet. The accuracy and F1-score of the improved model increased by 7.25, and 2.26 percentage points, respectively, compared with the YOLOX-L. In terms of the mAP@50, the TransFree performed the best, which were 1.8, 0.71, and 0.64 percentage points higher than the Faster R-CNN, CenterNet, and YOLOX-L, respectively. The inference speed of the TransFree reached 43.48 frames/s, which was 7.44 and 1.97 frames/s slower than that of the YOLOX-L and CenterNet, respectively, but 22.59 frames/s faster than that of the Faster R-CNN. The model size of TransFree was 111 and 84.6 MB less than that of the Faster R-CNN and YOLOX-L, respectively, but only 53.8 MB more than that of CenterNet. In summary, the optimal combination of the TransFree model was achieved in the piglet pose recognition, in terms of the recognition accuracy, inference speed, and moderate model size. An exploration was also made for the tracking of all-weather piglet posture. The finding can provide promising ideas for piglet behavior recognition and subsequent assessment of piglet welfare.