自动化场景区分下FABF-YOLOv8s轻量化肉牛行为识别方法

付辰伏; 任力生; 王芳

doi:10.11975/j.issn.1002-6819.202404073

摘要: 针对现有目标检测模型在自然天气环境下肉牛行为识别易受复杂背景干扰及模型参数量、计算量、权重文件占用内存较大等问题，该研究提出基于自动化场景区分的轻量化肉牛多行为识别方法。首先，通过FasterNet模型自动区分天气场景。其次，对YOLOv8s网络进行轻量化设计改进，在Backbone端更换FasterNet 轻量级模型主干，结合尺度内特征交互AIFI，捕获重要特征信息；利用加权双向特征金字塔BiFPN作为Neck端网络，选择特征提取网络C2f-Faster作为节点，缩减卷积过程中参数量和计算量的同时提高模型精度，使其更适用肉牛行为识别及后期的部署。然后，使用MPDIoU 函数，解决肉牛交叉遮挡等局限性问题。最后，设计系统可视化界面，以图像和视频形式输入模型，借助可视化系统完成对肉牛行为识别效果展示。试验结果表明，FABF-YOLOv8s（FasterNet、AIFI、BiFPN、C2f-Faster，FABF）模型在肉牛行为数据集上，相较于YOLOv5s、YOLOv7和原YOLOv8s模型的mAP@0.5分别提升了1.1、4.7、0.4个百分点，参数量和浮点数计算量分别减少59.48%和43.66%，降低到4.51 M和16.0 GFLOPs。引入自然场景因素构建的FasterNet-FABF-YOLOv8s模型mAP@0.5达到了94.6%。研究表明，自动化场景区分下构建轻量化肉牛行为识别系统，可为农户监测肉牛健康状况以及自动化智慧养殖提供技术支持。

Abstract: Object detection has been widely applied in daily life in recent years, including pedestrian detection, face recognition, and item counting. Among them, deep learning has brought some breakthroughs in the field of object detection. Therefore, object detection can be expected to be applied to the livestock industry, such as cow farms. However, some challenges have been posed by the existing object detection models. Particularly, beef cattle behavior can be hardly recognized under natural weather conditions, due mainly to the susceptibility to complex background interference, large model parameter size, computational complexity, and substantial memory usage for weight files. In this study, a lightweight (FABF-YOLOv8s) model was proposed to recognize the multiple behaviors of beef cattle using automatic scene distinction. Video data sources were collected from the real locations of beef cattle breeding. An all-weather natural scene dataset was constructed with the varying weather conditions. The beef cattle behavior was labeled in the scene dataset using LabelImg software and semi-automatic system annotation tools, thereby constructing a beef cattle behavior dataset. The natural scene dataset was trained using the FasterNet model. The output categories were obtained to verify the scene, and then automatically divide the weather scenes. The YOLOv8 model was obtained to determine the scene features before the detection of beef cattle behavior. The lightweight YOLOv8s network was also selected. The FasterNet lightweight model was used as the Backbone of the YOLOv8s, thus eliminating the computational redundancy of complex models for simple tasks. The Intra-Scale Feature Interaction (AIFI) was combined with the FasterNet lightweight model, in order to enhance the network focus among the features at the same scale, and then to capture the important features. The Weighted Bidirectional Feature Pyramid (BiFPN) was used as the Neck end network. The learnable weights were introduced to learn different input features. The top-down and bottom-up bidirectional paths were repeatedly applied to carry out the multi-scale feature fusion. The C2f-Faster feature extraction network was selected as the node. The parameter size and computational complexity were reduced during convolution. The accuracy of the model was improved to recognize the beef cattle behavior and subsequent deployment. The MPDIoU function was selected to overcome some limitations, such as cross occlusion of beef cattle, and the similar comparison between the prediction and bounding box, particularly with/without the overlap in the bounding box. The accuracy was guaranteed to compress the model structure, parameter quantity, and memory size, in order to obtain the detection model with high accuracy, small size, and strong generalization. An accurate identification of beef cattle behavior was achieved, such as standing, lying, feeding, drinking water, and licking. Finally, the visualization interface was designed to input the dataset in the form of images and videos. The recognition of beef cattle behavior was more intuitively demonstrated by the visualization system. The experimental results show that the FABF-YOLOv8s model (FasterNet, AIFI, BiFPN, and C2f-Faster, FABF) was achieved in an average accuracy, parameter volume, and floating-point computation of 93.6%, 4.51M, and 16.0 GFLOPs, respectively, on the self-built dataset of beef cattle behavior. Consequently, the average accuracy increased by 1.1, 4.7, and 0.4 percentage points, respectively, compared with the YOLOv5s, YOLOv7, and original YOLOv8s models. The parameter volume and floating-point computation decreased by 59.48% and 43.66%, respectively, compared with the YOLOv8s. The accuracy of detection was improved to minimize the computational complexity. The FasterNet’s automated scene distinction was achieved in an Accuracy, Precision, and F1-Score of 99.1%, 98.3%, and 96.7%, respectively. The FasterNet-FABF-YOLOv8s model introduced the natural scene factors, with an average accuracy of 94.6%. There were the smaller parameter size and floating-point computation, whereas, the precision was higher, compared with the Faster-RCNN, SSD, FCOS, and DETR models. A lightweight recognition of beef cattle behavior was constructed under an automated scene distinction system. The finding can provide technical support to monitor the health status of beef cattle and automated intelligent breeding.

自动化场景区分下FABF-YOLOv8s轻量化肉牛行为识别方法

Recognizing beef cattle behavior under automatic scene distinction using lightweight FABF-YOLOv8s