基于改进YOLOv5的复杂跨域场景下的猪个体识别与计数

    Detecting and counting pig number using improved YOLOv5 in complex scenes

    • 摘要: 为解决复杂跨域场景下猪个体的目标检测与计数准确率低下的问题,该研究提出了面向复杂跨域场景的基于改进YOLOv5(You Only Look Once version 5)的猪个体检测与计数模型。在骨干网络中分别集成了CBAM(Convolutional Block Attention Module)即融合通道和空间注意力的模块和Transformer自注意力模块,并将CIoU(Complete Intersection over Union)Loss替换为EIoU(Efficient Intersection over Union)Loss,以及引入了SAM (Sharpness-Aware Minimization)优化器并引入了多尺度训练、伪标签半监督学习和测试集增强的训练策略。试验结果表明,这些改进使模型能够更好地关注图像中的重要区域,突破传统卷积只能提取卷积核内相邻信息的能力,增强了模型的特征提取能力,并提升了模型的定位准确性以及模型对不同目标大小和不同猪舍环境的适应性,因此提升了模型在跨域场景下的表现。经过改进后的模型的mAP@0.5值从87.67%提升到98.76%,mAP@0.5:0.95值从58.35%提升到68.70%,均方误差从13.26降低到1.44。该研究的改进方法可以大幅度改善现有模型在复杂跨域场景下的目标检测效果,提高了目标检测和计数的准确率,从而为大规模生猪养殖业生产效率的提高和生产成本的降低提供技术支持。

       

      Abstract: The number of pigs in the shed often varies continuously in large-scale breeding scenes, due to the elimination, sale, and death. It is necessary to count the number of pigs during breeding. At the same time, the health status of the pigs is closely related to their behavior. The abnormal behavior can be predicted in time from the normal behavior of pigs for better economic benefits. Object detection can be expected to detect and count at the same time. The detection can be the basis of behavioral analysis. However, the current detection and counting performance can be confined to the blur cross-domain at the different shooting angles and distances in the complex environment of various pig houses. In this study, a novel model was proposed for pig individual detection and counting using an improved YOLOv5(You Only Look Once Version 5) in the complex cross-domain scenes. The study integrated CBAM (Convolutional Block Attention Module), a module that combined both channel and spatial attention modules, in the backbone network, and integrated the Transformer, a self-attention module, in the backbone network, and replaced CIoU(Complete IoU) Loss by EIoU(Efficient IoU) Loss, and introduced the SAM (Sharpness-Aware Minimization) optimizer and training strategies for multi-scale training, pseudo-label semi-supervised learning, and test set augment. The experimental results showed that these improvements enabled the model to better focus on the important areas in the image, broke the barrier that traditional convolution can only extract adjacent information within the convolution kernel, enhanced the feature extraction ability, and improved the localization accuracy of the model and the adaptability of the model to different object sizes and different pig house environments, thus improving the performance of the model in cross-domain scenes. In order to verify the effectiveness of the above improved methods, this paper used datasets from real scenes. There was cross-domain between these datasets, not only in the background environment, but also in the object size and the aspect ratio of the object itself. Sufficient ablation experiments showed that the improved methods used in this paper were effective. Whether integrating CBAM, integrating Transformer, using EIoU Loss, using SAM optimizer, using multi-scale training, or using a combination of pseudo-label semi-supervised learning and test set augment, the mAP (mean Average Precision) @0.5 values, the mAP@0.5:0.95 values and the MSE (Mean Square Errors) of the model where improved to varying degrees. After integrating all improvement methods, the mAP@0.5 value of the improved model was increased from 87.67% to 98.76%, the mAP@0.5:0.95 value was increased from 58.35% to 68.70%, and the MSE was reduced from 13.26 to 1.44. Compared with the classic Faster RCNN model, the VarifocalNet model for dense object detection and the YOLOX model belong to anchor-free, the detection performance and counting performance of the improved model in this paper had greater advantages regardless of which evaluation metric was chosen, and was still able to maintain a relatively fast speed. The results showed that the improved model in this paper exhibited strong feature extraction and generalization ability, and could still accurately identify most of the objects to be tested even in cross-domain scenes. The above research results demonstrated that the improved method in this paper could significantly improve the object detection effect of the existing model in complex cross-domain scenes and increase the accuracy of object detection and counting, so as to provide technical support for improving the production efficiency of large-scale pig breeding and reducing production costs.

       

    /

    返回文章
    返回