VanillaFaceNet:一种高精度快速推理的牛脸识别方法

    VanillaFaceNet: A high-precision and rapid inference for bovine face recognition

    • 摘要: 快速精准确定牛只身份对于牛只活体贷款,改善牛只骗保等问题具有重要意义。针对不同牛只面部差异小,FaceNet网络层数深,推理速度较慢,模型分类精度不足等问题,该研究提出了基于FaceNet的牛脸识别方法-VanillaFaceNet。该方法首先将FaceNet的主干特征提取网络替换为极简网络VanillaNet-13并提出动态激活和增强型线性变换的激活函数两种方法提高网络的非线性;然后,提出一种新的DBCA(dual-branch coordinate attention)注意力模块,能够更好地反映不同牛只面部特征之间的差异,从而提高网络的识别精度;最后,针对triplet loss仅能减小牛只类间差异的问题,采用center-triplet loss联合监督来减少牛只类内差异,从而提高了相同牛只身份比对的准确性。基于自建的牛脸数据集对该模型进行训练和测试,试验结果表明,VanillaFaceNet对牛只识别的准确率达到88.21%,每秒传输帧数为26.23帧。与FaceNet、MobileFaceNet、CenterFace、CosFace和ArcFace算法相比,本文算法的识别准确率分别提高了2.99、9.58、6.26、3.85和4.49个百分点,推理速度分别提升了2.67、0.77、0.10、1.28和0.94帧/s。该模型对牛只有较为优秀的识别效果,适于在嵌入式设备上部署,实现了牛只面部识别精度和推理速度之间的平衡。

       

      Abstract: Intelligent farming has been an ever-increasing trend in agricultural production, with the development of artificial intelligence (AI) and Internet of Things (IoT). Rapid and accurate identification of cattle identity is of great significance to prevent the insurance fraud for the live cattle loans in the cattle industry. Among them, computer vision can be expected for the cattle face recognition in the modernization transformation of the livestock industry. Smart devices and systems can also be integrated to achieve the intelligent cattle management, feeding, and disease prevention. However, the traditional identification (such as ear tags and collars) has limited the large-scale production in recent years, due to the small differences in facial features among different cattle, the deep layers of the FaceNet network, slow inference speeds, and insufficient classification accuracy. In this study, a cattle face recognition was proposed using FaceNet, called VanillaFaceNet. Firstly, the backbone feature extraction network of FaceNet was replaced with the latest simplified network. VanillaNet-13. Dynamic activation and enhanced linear transformation of activation functions were proposed to improve the non-linearity of the network. Specifically, dynamic activation was fully utilized the expressive power of activation functions during training when dynamically adjusting, in order to flexibly adapt the variations in data distribution at different stages of training. Dynamic activation was used to merge the convolutional layers during inference phase. The computational load was reduced to improve the inference speed of networks. The performance and efficiency of model were then enhanced during training and inference. Activation functions with linear transformations were significantly enhanced the non-linearity through parallel stacking. Multiple activation functions were stacked in parallel, thus enabling each layer to capture more complex features. Additionally, spatial context information was embedded within the activation functions. The spatial relationships among features were better utilized to fit the complex feature distributions. Non-linearity and integration of spatial context information were achieved in a more accurate and efficient model when processing complex data. Secondly, DBCA (Dual-Branch Coordinate Attention) module was added into the global maximum pooling. Global average pooling was used to aggregate significant features of cattle faces, in order better represent the differences among cattle facial features. Therefore, the accuracy network was improved to recognize the cattle. Finally, a center loss was introduced to train the network with the center-triplet loss joint supervision, because the triplet loss was only reduced the inter-class differences among cattle. The intra-class separability of cattle was improved to compactly aggregate the same category of cattle. Thus, the accuracy of comparisons was improved among the same identities of cattle. Cattle face videos were collected at the Otai Ranch in Hohhot, Inner Mongolia Autonomous Region. An image dataset was constructed to train and test the model for the cattle face recognition. The experimental results show that VanillaFaceNet was achieved an accuracy of 88.21% in the cattle recognition, with a frame rate of 26.23 frames per second (FPS). Compared with FaceNet, MobileFaceNet, CenterFace, CosFace, and ArcFace, the model was improved the recognition accuracy by 2.99, 9.58, 6.26, 3.85, and 4.49 percentage points, respectively, and the inference speed by 2.67, 0.77, 0.10, 1.28, and 0.94 frames/s, respectively. The recognition accuracy and speed were greatly improved to fully meet the requirements of the ranch for the accuracy and real-time performance of cattle recognition. The excellent performance was achieved in the cattle recognition, suitable for the deployment on embedded devices, such as Jetson AGX Xavier. A better balance was also gained between accuracy and inference speed of cattle facial recognition.

       

    /

    返回文章
    返回