Abstract:
Abstract: Daily behaviors of dairy cows (such as standing, drinking, walking, and lying down) are closely related to their physical health. Efficient and accurate identification of dairy cow behavior is of great significance to timely grasp the health status for the better economic benefits of the farm. However, the cow behavior data varies significantly in the group breeding environment, due to the complex scene, conditions, and diverse cow behavior. In addition, the behavior recognition can also be confined to the target scale of cows covering a wide area under different perspectives. In this study, a multi-scale behavior recognition was proposed for the dairy cow using an improved YOLOV5s network. First, a channel-based Transformer attention mechanism was introduced at the top layer of the backbone network. The learnable location parameters were then added to all channels of the top-level feature map with the high-level semantics. As such, the relationship between feature channels and regional information was established, where the size of the location parameter was represented by the region. Secondly, a correlation analysis was performed on the channel sequences at different levels, combined with the multi-head self-attention mechanism of the Transformer. The degree of importance was then obtained to strengthen the expression of feature information between channels. Thus, the long-range dependency between regions and feature channels was built for the model to focus on the cow target area during training. Thirdly, the PAN Neck structure was used to transfer the feature information of different levels through up-sampling and down-sampling. The PAN Neck branch was then added to the feature map by twice down-sampling for the multi-scale behavioral target of dairy cows. The target detector was also selected for the underlying features, where the high-level semantic information of the top layer was integrated into the underlying features under the action of PAN Neck. Correspondingly, a feature map was constructed with detailed features and high-level semantic information, while the SE attention machine was introduced for the global pooling. Finally, the global information of the feature map channel was extracted to determine the importance of the single-layer MLP. The weight of important features also increased to suppress the propagation of noise information at the channel level. Four-scale detectors were optimized to construct the optimal SEPH of multi-scale targets for the better performance of multi-scale behavior recognition of dairy cows. Consequently, there was no weight surge for the improved recognition model of cow behavior. Specifically, the mAP of multi-scale target recognition increased by 1.2 percentage points after experimental verification, especially the AP of two similar behavior (cow walking and standing) recognition increased by 0.8 and 4.9 percentage points, respectively, compared with the original. Nevertheless, the cow's drinking water and eating behaviors cannot be detected directly during this time. A second-level behavior evaluation was then proposed for the cows using greedy thinking. Therefore, the spatial information of the auxiliary judgment label was used to jointly determine the drinking and eating behavior of cows. The numbers of errors were two and five for the cow drinking water and eating, respectively, indicating the better performance of the improved YOLOV5s network.