基于改进YOLOV5s网络的奶牛多尺度行为识别方法

    Multi-scale behavior recognition method for dairy cows based on improved YOLOV5s network

    • 摘要: 奶牛站立、喝水、行走、躺卧等日常行为与其生理健康密切相关,高效准确识别奶牛行为对及时掌握奶牛健康状况,提高养殖场经济效益具有重要意义。针对群体养殖环境下奶牛行为数据中,场景复杂、目标尺度变化大、奶牛行为多样等对行为识别造成的干扰,该研究提出一种改进YOLOV5s奶牛多尺度行为识别方法。该方法在骨干网络顶层引入基于通道的Transformer注意力机制使模型关注奶牛目标区域,同时增加路径聚合结构的支路与检测器获取奶牛行为图像的底层细节特征,并引入SE(Squeeze-and-Excitation Networks)注意力机制优化检测器,构建SEPH(SE Prediction Head)识别重要特征,提高奶牛多尺度行为识别能力。试验验证改进后的奶牛行为识别模型在无权重激增的同时,多尺度目标识别结果的平均精度均值较YOLOV5s提高1.2个百分点,尤其是对奶牛行走识别结果的平均精度4.9个百分点,研究结果为群体养殖环境下,全天实时监测奶牛行为提供参考。

       

      Abstract: Abstract: Daily behaviors of dairy cows (such as standing, drinking, walking, and lying down) are closely related to their physical health. Efficient and accurate identification of dairy cow behavior is of great significance to timely grasp the health status for the better economic benefits of the farm. However, the cow behavior data varies significantly in the group breeding environment, due to the complex scene, conditions, and diverse cow behavior. In addition, the behavior recognition can also be confined to the target scale of cows covering a wide area under different perspectives. In this study, a multi-scale behavior recognition was proposed for the dairy cow using an improved YOLOV5s network. First, a channel-based Transformer attention mechanism was introduced at the top layer of the backbone network. The learnable location parameters were then added to all channels of the top-level feature map with the high-level semantics. As such, the relationship between feature channels and regional information was established, where the size of the location parameter was represented by the region. Secondly, a correlation analysis was performed on the channel sequences at different levels, combined with the multi-head self-attention mechanism of the Transformer. The degree of importance was then obtained to strengthen the expression of feature information between channels. Thus, the long-range dependency between regions and feature channels was built for the model to focus on the cow target area during training. Thirdly, the PAN Neck structure was used to transfer the feature information of different levels through up-sampling and down-sampling. The PAN Neck branch was then added to the feature map by twice down-sampling for the multi-scale behavioral target of dairy cows. The target detector was also selected for the underlying features, where the high-level semantic information of the top layer was integrated into the underlying features under the action of PAN Neck. Correspondingly, a feature map was constructed with detailed features and high-level semantic information, while the SE attention machine was introduced for the global pooling. Finally, the global information of the feature map channel was extracted to determine the importance of the single-layer MLP. The weight of important features also increased to suppress the propagation of noise information at the channel level. Four-scale detectors were optimized to construct the optimal SEPH of multi-scale targets for the better performance of multi-scale behavior recognition of dairy cows. Consequently, there was no weight surge for the improved recognition model of cow behavior. Specifically, the mAP of multi-scale target recognition increased by 1.2 percentage points after experimental verification, especially the AP of two similar behavior (cow walking and standing) recognition increased by 0.8 and 4.9 percentage points, respectively, compared with the original. Nevertheless, the cow's drinking water and eating behaviors cannot be detected directly during this time. A second-level behavior evaluation was then proposed for the cows using greedy thinking. Therefore, the spatial information of the auxiliary judgment label was used to jointly determine the drinking and eating behavior of cows. The numbers of errors were two and five for the cow drinking water and eating, respectively, indicating the better performance of the improved YOLOV5s network.

       

    /

    返回文章
    返回