融合重参数化和注意力机制的猪脸关键点检测方法

    Key point detection method for pig face fusing reparameterization and attention mechanisms

    • 摘要: 面部对齐是猪脸识别中至关重要的步骤,而实现面部对齐的必要前提是对面部关键点的精准检测。生猪易动且面部姿态多变,导致猪脸关键点提取不准确,且目前没有准确快捷的猪脸关键点检测方法。针对上述问题,该研究提出了生猪面部关键点精准检测模型YOLO-MOB-DFC,将人脸关键点检测模型YOLOv5Face进行改进并用于猪脸关键点检测。首先,使用重参数化的MobileOne作为骨干网络降低了模型参数量;然后,融合解耦全连接注意力模块捕捉远距离空间位置像素之间的依赖性,使模型能够更多地关注猪面部区域,提升模型的检测性能;最后,采用轻量级上采样算子CARAFE充分感知邻域内聚合的上下文信息,使关键点提取更加准确。结合自建的猪脸数据集进行模型测试,结果表明,YOLO-MOB-DFC的猪脸检测平均精度达到99.0%,检测速度为153帧/s,关键点的标准化平均误差为2.344%。相比RetinaFace模型,平均精度提升了5.43%,模型参数量降低了78.59%,帧率提升了91.25%,标准化平均误差降低了2.774%;相较于YOLOv5s-Face模型,平均精度提高了2.48%,模型参数量降低了18.29%,标准化平均误差降低了0.567%。该文提出的YOLO-MOB-DFC模型参数量较少,连续帧间的标准化平均误差波动更加稳定,削弱了猪脸姿态多变对关键点检测准确性的影响,同时具有较高的检测精度和检测效率,能够满足猪脸数据准确、快速采集的需求,为高质量猪脸开集识别数据集的构建以及非侵入式生猪身份智能识别奠定基础。

       

      Abstract: Agricultural production efficiency is ever increasing in recent years, particularly with the continuous development of intelligent breeding technology. The production efficiency and welfare of animals have also been enhanced significantly. It is crucial to the accurate identification and management of important livestock, such as pigs. However, the traditional individual identification on the ear tags, ear notches, and color markings can easily lead to some injuries and infections in pigs, due to the labor intensity and marking time. In contrast, non-invasive individual identification methods can be expected to more conveniently, quickly, and accurately obtain the pig information, thereby improving breeding efficiency and pig welfare. Among them, facial alignment can be one of the most essential steps in pig face recognition. The prerequisite of facial alignment is to accurately locate the facial key points. However, the inaccurate extraction of the pig face key points can be resulted from the pig's movement and varying facial poses. It is a high demand to extract accurate and efficient key points for pig face detection. In this study, a precise detection model (YOLO-MOB-DFC) was proposed for the pig facial key points. The human face key points detection model YOLOv5Face was also innovatively adapted during detection. Firstly, the re-parameterized MobileOne was used as the backbone network to greatly reduce the model parameters. Then, the decoupled fully connected attention module was integrated to capture the dependency among pixels at distant spatial positions, in order to enable the model to focus more on the pig's facial region for higher detection performance. Finally, the lightweight upsampling operator CARAFE was employed to fully perceive the aggregated contextual information within the neighborhood. As such, the more accurate extraction of pig facial key points was achieved after detection. A pig face dataset was constructed using 100 sow video data and 220 images with complex backgrounds featuring multiple pigs. The SSIM structural similarity algorithm was used to filter the high-similarity images without overfitting. The Labelme was used to mark the pig's face, eyes, bilateral tips of the nose, and nose tip. Six data augmentation operations were applied to enhance the model's generalization capability for offline augmentation. The custom-built pig face dataset was used to test the improved model. The results showed that the average accuracy of pig face detection was up to 99.0%, the detection speed was 153 FPS, and the normalized mean error of key points was 2.344%. The average accuracy increased by 5.43%, the number of model parameters was reduced by 78.59%, the frame rate increased by 91.25%, and the normalized mean error was reduced by 2.774%, compared with the RetinaFace model. Meanwhile, the average accuracy was improved by 2.48%, the number of model parameters was reduced by 18.29%, and the normalized mean error was reduced by 0.567%, compared with the YOLOv5s-Face model. The YOLO-MOB-DFC model shared fewer parameters. There was a more stable Normalized Mean Error (NME) fluctuation between continuous frames. There was the reduced impact of the varying pig face poses on the accuracy of keypoint detection. The improved model can be expected to provide higher detection accuracy and efficiency, in order to quickly and accurately obtain the pig face key point data. The finding can lay the foundation to construct high-quality pig face open-set recognition datasets and non-invasive intelligent identification of pig individuals. Non-invasive intelligent identification of individual pigs can be a trend in more intelligent and sustainable animal husbandry, in order to greatly improve the welfare and production efficiency of pigs, while reducing human labor and time consumption.

       

    /

    返回文章
    返回