基于改进AlignedReID++的肉牛个体重识别方法

应潇溢; 赵继政; 杨玲玲; 周馨怡; 王磊; 高延年; 昝林森; 杨武才; 刘含; 宋怀波

doi:10.11975/j.issn.1002-6819.202404229

基于改进AlignedReID++的肉牛个体重识别方法

Re-identifying beef cattle using improved AlignedReID++

摘要

摘要: 准确和持续的牛只个体识别对于精准养殖具有重要意义。中远距离和跨摄像头场景下的牛只个体识别是监测个体采食量和进食时间的基础。肉牛具有在采食过程中频繁移动和改变采食位置的特点，牛只方向变化频繁，加之牛只个体的生物相似性以及复杂的环境变化（光线、遮挡和背景），导致跨摄像头牛只个体识别困难。该研究基于AlignedReID++模型可充分利用全局信息和局部信息进行高效图像匹配的优点，并在此基础上进行了改进，以实现更优的牛只个体重识别效果。在改进模型中，ResNet50主干网络的BottleNeck结构应用了三重注意力机制模块，以实现在引入少量参数量的情况下，通过跨维交互，加强模型对于个体图像的特征提取能力；基线模型的全局分支中的交叉熵损失函数被替换成CosFace损失函数，并与困难三元组损失函数共同训练改进后的模型，以提升模型分辨相似个体的能力。改进模型的rank-1准确率为94.42%，平均精度均值为63.90%。与基线模型相比，rank-1准确率提高了1.84个百分点，平均精度均值提高了6.42个百分点。与PCB (part-based convolutional baseline) 相比，在rank-1准确率和平均精度指标上分别超出了4.72和5.86个百分点。与MGN (multiple granularity network) 相比，在rank-1准确率和平均精度均值指标上分别超出了0.76和4.30个百分点。与TransReID相比，在rank-1准确率指标上低了0.98个百分点，在平均精度均值指标上超出了3.90个百分点。与RGA (relation-aware global attention) 相比，在rank-1准确率和平均精度均值指标上分别超出了5.36和7.38个百分点。此外，改进模型的浮点运算次数为5.45 G，仅比基线模型大0.05 G，分别比PCB、MGN、RGA和TransReID小0.68、6.51、25.4和16.55 G。同时，改进模型的模型大小为23.78 M，在对比模型中，其模型大小是最小的。改进模型在CPU上的推理速度为5.64 帧/s，低于PCB、MGN及其基线模型，高于TransReID和RGA。t-SNE特征嵌入可视化结果显示，改进模型提取的个体样本的全局特征和局部特征可以实现良好的类内紧凑性和类间差异性。本研究结果表明，所提出的方法能够有效地重识别自然养殖场景下的肉牛个体，对个体采食量和进食时间的监测具有较好的指导意义。

Abstract: Accurate and continuous identification of individual cattle is crucial to precision farming in recent years. It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium to long distances over different cameras. However, beef cattle can tend to frequently move and change their feeding position during feeding. Furthermore, the great variations in their head direction and complex environments (light, occlusion, and background) can also lead to some difficulties in the recognition, particularly for the bio-similarities among individual cattle. Among them, AlignedReID++ model is characterized by both global and local information for image matching. In particular, the dynamically matching local information (DMLI) algorithm has been introduced into the local branch to automatically align the horizontal local information. In this research, the AlignedReID++ model was utilized and improved to achieve the better performance in cattle re-identification (ReID). Initially, triplet attention (TA) modules were integrated into the BottleNecks of ResNet50 Backbone. The feature extraction was then enhanced through cross-dimensional interactions with the minimal computational overhead. Since the TA modules in AlignedReID++ baseline model increased the model size and floating point operations (FLOPs) by 0.005 M and 0.05 G, the rank-1 accuracy and mean average precision (mAP) were improved by 1.0 percentage points and 2.94 percentage points, respectively. Specifically, the rank-1 accuracies were outperformed by 0.86 percentage points and 0.12 percentage points, respectively, compared with the convolution block attention module (CBAM) and efficient channel attention (ECA) modules, although 0.94 percentage points were lower than that of squeeze-and-excitation (SE) modules. The mAP metric values were exceeded by 0.22, 0.86 and 0.12 percentage points, respectively, compared with the SE, CBAM, and ECA modules. Additionally, the Cross-Entropy Loss function was replaced with the CosFace Loss function in the global branch of baseline model. CosFace Loss and Hard Triplet Loss were jointly employed to train the baseline model for the better identification on the similar individuals. AlignedReID++ with CosFace Loss was outperformed the baseline model by 0.24 and 0.92 percentage points in the rank-1 accuracy and mAP, respectively, whereas, AlignedReID++ with ArcFace Loss was exceeded by 0.36 and 0.56 percentage points, respectively. The improved model with the TA modules and CosFace Loss was achieved in a rank-1 accuracy of 94.42%, rank-5 accuracy of 98.78%, rank-10 accuracy of 99.34%, mAP of 63.90%, FLOPs of 5.45 G, frames per second (FPS) of 5.64, and model size of 23.78 M. The rank-1 accuracies were exceeded by 1.84, 4.72, 0.76 and 5.36 percentage points, respectively, compared with the baseline model, part-based convolutional baseline (PCB), multiple granularity network (MGN), and relation-aware global attention (RGA), while the mAP metrics were surpassed 6.42, 5.86, 4.30 and 7.38 percentage points, respectively. Meanwhile, the rank-1 accuracy was 0.98 percentage points lower than TransReID, but the mAP metric was exceeded by 3.90 percentage points. Moreover, the FLOPs of improved model were only 0.05 G larger than that of baseline model, while smaller than those of PCB, MGN, RGA, and TransReID by 0.68, 6.51, 25.4, and 16.55 G, respectively. The model size of improved model was 23.78 M, which was smaller than those of the baseline model, PCB, MGN, RGA, and TransReID by 0.03, 2.33, 45.06, 14.53 and 62.85 M, respectively. The inference speed of improved model on a CPU was lower than those of PCB, MGN, and baseline model, but higher than TransReID and RGA. The t-SNE feature embedding visualization demonstrated that the global and local features were achieve in the better intra-class compactness and inter-class variability. Therefore, the improved model can be expected to effectively re-identify the beef cattle in natural environments of breeding farm, in order to monitor the individual feed intake and feeding time.

HTML全文

参考文献(44)

施引文献

资源附件(0)