Abstract
Object detection has been widely applied in daily life in recent years, including pedestrian detection, face recognition, and item counting. Among them, deep learning has brought some breakthroughs in the field of object detection. Therefore, object detection can be expected to be applied to the livestock industry, such as cow farms. However, some challenges have been posed by the existing object detection models. Particularly, beef cattle behavior can be hardly recognized under natural weather conditions, due mainly to the susceptibility to complex background interference, large model parameter size, computational complexity, and substantial memory usage for weight files. In this study, a lightweight (FABF-YOLOv8s) model was proposed to recognize the multiple behaviors of beef cattle using automatic scene distinction. Video data sources were collected from the real locations of beef cattle breeding. An all-weather natural scene dataset was constructed with the varying weather conditions. The beef cattle behavior was labeled in the scene dataset using LabelImg software and semi-automatic system annotation tools, thereby constructing a beef cattle behavior dataset. The natural scene dataset was trained using the FasterNet model. The output categories were obtained to verify the scene, and then automatically divide the weather scenes. The YOLOv8 model was obtained to determine the scene features before the detection of beef cattle behavior. The lightweight YOLOv8s network was also selected. The FasterNet lightweight model was used as the Backbone of the YOLOv8s, thus eliminating the computational redundancy of complex models for simple tasks. The Intra-Scale Feature Interaction (AIFI) was combined with the FasterNet lightweight model, in order to enhance the network focus among the features at the same scale, and then to capture the important features. The Weighted Bidirectional Feature Pyramid (BiFPN) was used as the Neck end network. The learnable weights were introduced to learn different input features. The top-down and bottom-up bidirectional paths were repeatedly applied to carry out the multi-scale feature fusion. The C2f-Faster feature extraction network was selected as the node. The parameter size and computational complexity were reduced during convolution. The accuracy of the model was improved to recognize the beef cattle behavior and subsequent deployment. The MPDIoU function was selected to overcome some limitations, such as cross occlusion of beef cattle, and the similar comparison between the prediction and bounding box, particularly with/without the overlap in the bounding box. The accuracy was guaranteed to compress the model structure, parameter quantity, and memory size, in order to obtain the detection model with high accuracy, small size, and strong generalization. An accurate identification of beef cattle behavior was achieved, such as standing, lying, feeding, drinking water, and licking. Finally, the visualization interface was designed to input the dataset in the form of images and videos. The recognition of beef cattle behavior was more intuitively demonstrated by the visualization system. The experimental results show that the FABF-YOLOv8s model (FasterNet, AIFI, BiFPN, and C2f-Faster, FABF) was achieved in an average accuracy, parameter volume, and floating-point computation of 93.6%, 4.51M, and 16.0 GFLOPs, respectively, on the self-built dataset of beef cattle behavior. Consequently, the average accuracy increased by 1.1, 4.7, and 0.4 percentage points, respectively, compared with the YOLOv5s, YOLOv7, and original YOLOv8s models. The parameter volume and floating-point computation decreased by 59.48% and 43.66%, respectively, compared with the YOLOv8s. The accuracy of detection was improved to minimize the computational complexity. The FasterNet’s automated scene distinction was achieved in an Accuracy, Precision, and F1-Score of 99.1%, 98.3%, and 96.7%, respectively. The FasterNet-FABF-YOLOv8s model introduced the natural scene factors, with an average accuracy of 94.6%. There were the smaller parameter size and floating-point computation, whereas, the precision was higher, compared with the Faster-RCNN, SSD, FCOS, and DETR models. A lightweight recognition of beef cattle behavior was constructed under an automated scene distinction system. The finding can provide technical support to monitor the health status of beef cattle and automated intelligent breeding.