Abstract:
Animal husbandry is ever-increasing in artificial intelligence (AI) at present, in terms of scale, informatization, and refinement. Collective cattle farms can be expected to gradually replace the small-scale mode, such as individual farming. An important challenge has posed on the automated management in animal husbandry. It is very necessary to accurately identify the livestock population in the large-scale grazing and breeding, in order to determine the quantity and then update the population information in real-time. However, it is very difficult for the scale and automation of livestock detection, due to the complex environment in the pastoral areas, which was often obstructed by trees, sunlight, and sandstorms. The current object algorithms often encounter missed and error detections, even leading to monitoring accurately. In this article, a livestock detection called LDHorNet (livestock detection HorNet) was proposed using the YOLOV5s object detection network. Firstly, a HorNB module was designed to detect the small targets in the long-distance and complex environments. The recursive gated convolution in the HorNB was used to extend the second-order interactions in self-attention to any order. As such, the C3 module was then replaced in the Backbone and Neck sections, in order to improve the performance of the network model. The high-order spatial information interaction was realized without introducing significant additional calculations. The better performance of the improved model was achieved to understand the relationship between different targets in the image, and then better capture the spatial structure and contextual information between targets. The accuracy of detection was also enhanced to locate and detect the small targets. Secondly, the gradient accumulation was utilized to decrease the variance of gradient, in order to enhance both the stability of the model and the efficiency of feature extraction. The CBAM (convolutional block attention module) attention mechanism was also embedded in the Neck section of the network structure to better capture the key information in the image. The detection accuracy and attention weight of small targets were improved to effectively solve the interference of light, wind and sand, noise, and livestock motion blur on livestock detection in complex pastoral environments. At the same time, there was a decrease in the missed detection, due to overlapping occlusion of livestock in the pastoral areas. The repulsion loss function was utilized to handle the overlapping occlusion. There was an enhancement in the recall and accuracy of the model. Finally, accurate detection was achieved in the complex scenes, such as the complex lighting and environmental interference. The experimental results show that the LDHorNet model shared the better performance and high accuracy, with a Precision and Recall of 95.24% and 88.87%, respectively, and with the mAP_0.5 and mAP_0.5:0.95 of 94.11% and 77.01%, respectively. The precision of the improved model increased by 2.83, 2.93, and 9.79 percentage points, respectively, compared with the YOLOv5s, YOLOv8s, and YOLOv7 Tiny, and the Recall increased by 6.66, 4.95, and 13.42 percentage points, respectively, mAP_0.5 increased by 3.98, 3.4, and 10.78 percentage points, respectively, with mAP_0.5:0.95 increased by 12.46, 5.26, and 20.97 percentage points, respectively. This network performed the best to detect the livestock in the small targets and occlusion scenes. Therefore, this model can provide a strong reference for livestock detection under large-scale grazing conditions.