Abstract:
Intelligent farming has been an ever-increasing trend in agricultural production, with the development of artificial intelligence (AI) and Internet of Things (IoT). Rapid and accurate identification of cattle identity is of great significance to prevent the insurance fraud for the live cattle loans in the cattle industry. Among them, computer vision can be expected for the cattle face recognition in the modernization transformation of the livestock industry. Smart devices and systems can also be integrated to achieve the intelligent cattle management, feeding, and disease prevention. However, the traditional identification (such as ear tags and collars) has limited the large-scale production in recent years, due to the small differences in facial features among different cattle, the deep layers of the FaceNet network, slow inference speeds, and insufficient classification accuracy. In this study, a cattle face recognition was proposed using FaceNet, called VanillaFaceNet. Firstly, the backbone feature extraction network of FaceNet was replaced with the latest simplified network. VanillaNet-13. Dynamic activation and enhanced linear transformation of activation functions were proposed to improve the non-linearity of the network. Specifically, dynamic activation was fully utilized the expressive power of activation functions during training when dynamically adjusting, in order to flexibly adapt the variations in data distribution at different stages of training. Dynamic activation was used to merge the convolutional layers during inference phase. The computational load was reduced to improve the inference speed of networks. The performance and efficiency of model were then enhanced during training and inference. Activation functions with linear transformations were significantly enhanced the non-linearity through parallel stacking. Multiple activation functions were stacked in parallel, thus enabling each layer to capture more complex features. Additionally, spatial context information was embedded within the activation functions. The spatial relationships among features were better utilized to fit the complex feature distributions. Non-linearity and integration of spatial context information were achieved in a more accurate and efficient model when processing complex data. Secondly, DBCA (Dual-Branch Coordinate Attention) module was added into the global maximum pooling. Global average pooling was used to aggregate significant features of cattle faces, in order better represent the differences among cattle facial features. Therefore, the accuracy network was improved to recognize the cattle. Finally, a center loss was introduced to train the network with the center-triplet loss joint supervision, because the triplet loss was only reduced the inter-class differences among cattle. The intra-class separability of cattle was improved to compactly aggregate the same category of cattle. Thus, the accuracy of comparisons was improved among the same identities of cattle. Cattle face videos were collected at the Otai Ranch in Hohhot, Inner Mongolia Autonomous Region. An image dataset was constructed to train and test the model for the cattle face recognition. The experimental results show that VanillaFaceNet was achieved an accuracy of 88.21% in the cattle recognition, with a frame rate of 26.23 frames per second (FPS). Compared with FaceNet, MobileFaceNet, CenterFace, CosFace, and ArcFace, the model was improved the recognition accuracy by 2.99, 9.58, 6.26, 3.85, and 4.49 percentage points, respectively, and the inference speed by 2.67, 0.77, 0.10, 1.28, and 0.94 frames/s, respectively. The recognition accuracy and speed were greatly improved to fully meet the requirements of the ranch for the accuracy and real-time performance of cattle recognition. The excellent performance was achieved in the cattle recognition, suitable for the deployment on embedded devices, such as Jetson AGX Xavier. A better balance was also gained between accuracy and inference speed of cattle facial recognition.