Abstract:
Agricultural production efficiency is ever increasing in recent years, particularly with the continuous development of intelligent breeding technology. The production efficiency and welfare of animals have also been enhanced significantly. It is crucial to the accurate identification and management of important livestock, such as pigs. However, the traditional individual identification on the ear tags, ear notches, and color markings can easily lead to some injuries and infections in pigs, due to the labor intensity and marking time. In contrast, non-invasive individual identification methods can be expected to more conveniently, quickly, and accurately obtain the pig information, thereby improving breeding efficiency and pig welfare. Among them, facial alignment can be one of the most essential steps in pig face recognition. The prerequisite of facial alignment is to accurately locate the facial key points. However, the inaccurate extraction of the pig face key points can be resulted from the pig's movement and varying facial poses. It is a high demand to extract accurate and efficient key points for pig face detection. In this study, a precise detection model (YOLO-MOB-DFC) was proposed for the pig facial key points. The human face key points detection model YOLOv5Face was also innovatively adapted during detection. Firstly, the re-parameterized MobileOne was used as the backbone network to greatly reduce the model parameters. Then, the decoupled fully connected attention module was integrated to capture the dependency among pixels at distant spatial positions, in order to enable the model to focus more on the pig's facial region for higher detection performance. Finally, the lightweight upsampling operator CARAFE was employed to fully perceive the aggregated contextual information within the neighborhood. As such, the more accurate extraction of pig facial key points was achieved after detection. A pig face dataset was constructed using 100 sow video data and 220 images with complex backgrounds featuring multiple pigs. The SSIM structural similarity algorithm was used to filter the high-similarity images without overfitting. The Labelme was used to mark the pig's face, eyes, bilateral tips of the nose, and nose tip. Six data augmentation operations were applied to enhance the model's generalization capability for offline augmentation. The custom-built pig face dataset was used to test the improved model. The results showed that the average accuracy of pig face detection was up to 99.0%, the detection speed was 153 FPS, and the normalized mean error of key points was 2.344%. The average accuracy increased by 5.43%, the number of model parameters was reduced by 78.59%, the frame rate increased by 91.25%, and the normalized mean error was reduced by 2.774%, compared with the RetinaFace model. Meanwhile, the average accuracy was improved by 2.48%, the number of model parameters was reduced by 18.29%, and the normalized mean error was reduced by 0.567%, compared with the YOLOv5s-Face model. The YOLO-MOB-DFC model shared fewer parameters. There was a more stable Normalized Mean Error (NME) fluctuation between continuous frames. There was the reduced impact of the varying pig face poses on the accuracy of keypoint detection. The improved model can be expected to provide higher detection accuracy and efficiency, in order to quickly and accurately obtain the pig face key point data. The finding can lay the foundation to construct high-quality pig face open-set recognition datasets and non-invasive intelligent identification of pig individuals. Non-invasive intelligent identification of individual pigs can be a trend in more intelligent and sustainable animal husbandry, in order to greatly improve the welfare and production efficiency of pigs, while reducing human labor and time consumption.