齐咏生,焦杰,鲍腾飞,等. 基于自适应注意力机制的复杂场景下牛脸检测算法[J]. 农业工程学报,2023,39(14):173-183. DOI: 10.11975/j.issn.1002-6819.202304218
    引用本文: 齐咏生,焦杰,鲍腾飞,等. 基于自适应注意力机制的复杂场景下牛脸检测算法[J]. 农业工程学报,2023,39(14):173-183. DOI: 10.11975/j.issn.1002-6819.202304218
    QI Yongsheng, JIAO Jie, BAO Tengfei, et al. Cattle face detection algorithm in complex scenes using adaptive attention mechanism[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(14): 173-183. DOI: 10.11975/j.issn.1002-6819.202304218
    Citation: QI Yongsheng, JIAO Jie, BAO Tengfei, et al. Cattle face detection algorithm in complex scenes using adaptive attention mechanism[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(14): 173-183. DOI: 10.11975/j.issn.1002-6819.202304218

    基于自适应注意力机制的复杂场景下牛脸检测算法

    Cattle face detection algorithm in complex scenes using adaptive attention mechanism

    • 摘要: 牛面部检测与识别是牛场智能化养殖的关键,但由于牧场养殖环境的复杂性,牛脸检测会受到模糊、逆光和遮挡3种常见环境因素的严重干扰。针对此问题,该研究提出一种复杂场景下基于自适应注意力机制的牛脸检测算法,该算法首先针对3种干扰因素分别设计了评价指标,并将3种不同类型的评价指标通过模糊隶属度函数进行归一化,并确定自适应权重系数,真实反映目标所处场景的复杂性;之后,基于YOLOV7-tiny在主干特征提取网络引入一种注意力机制CDAA(composite dual-branch adaptive attention),设计通道和空间注意力并行结构,并融合自适应权重系数,有效加强相应注意力分支的权重,提高网络在复杂场景下的特征提取能力,解决复杂场景下网络检测精度差的问题;最后,将图像场景评价指标引入损失函数,对大尺度网格损失函数的权重进行自适应调整,使网络在训练过程中更专注于数量较多的小型目标,从而提升网络整体的检测精度。为检测算法的有效性和实时性,在特定数据集上进行消融试验,并与多种经典检测算法进行对比,并移植至Jetson Xavier NX平台测试。测试结果表明,该算法检测精度达到89.58%,相较于原YOLOV7-tiny网络,牛脸检测精度提高了7.34个百分点。检测速度达到62帧/s,在检测速度几乎不损失的条件下,检测效果优于原网络与对比网络。 研究结果可为复杂场景下的牛脸高效检测提供参考。

       

      Abstract: Precision breeding has been a research hotspot in the cattle breeding industry in recent years, with the development of smart livestock farming and large-scale expansion. Among them, facial detection and recognition of cattle can be critical to intelligent farming on ranches. However, the accuracy of facial detection can be severely affected by three common environmental factors: blurriness, backlighting, and occlusion, due to the complexity of the livestock farming environment. In this study, cattle facial detection was proposed using adaptive attention mechanisms in complex scenarios. Firstly, three evaluation indicators were designed for each of the three interfering factors of blurriness, backlighting, and occlusion, respectively, and then normalized the three types of evaluation indicators using fuzzy membership functions. Three indicators were also utilized to comprehensively evaluate the scene information of the input image. The weighting coefficients were adjusted to reflect the complexity of the target scene, according to the changes in the evaluation indicators. Secondly, a new attention mechanism called CDAA (Composite Dual-Branch Adaptive Attention) was introduced into the backbone feature extraction network using YOLOV7-tiny. The parallel structures were incorporated for channel and spatial attention, along with adaptive weighting coefficients to effectively enhance the respective attention branches' importance. Dynamic weighting was realized for the automatic adjustment of channel and spatial attention mechanisms in different scenarios. The network's ability was improved to extract the features in complex scenarios for higher detection accuracy in complex scenarios. The channel attention branch selectively emphasized the information features using global information using a fusion of global average pooling and global max pooling, in order to suppress the redundant features. As such, the edge features of the detection target were selectively highlighted to effectively solve image blurring and occlusion. The spatial attention branch also used a parallel structure of channel max pooling and channel average pooling, in order to arrange the different positions of feature information with different importance. Therefore, the important spatial positions were highlighted to suppress the spatial information redundancy, in order to effectively enhance the regional features under strong background backlighting interference. Finally, the image scene evaluation indicators were introduced into the loss function to adaptively adjust the weight of the large-scale grid loss function. The network was more focused to detect a large number of small targets during training, thereby improving the overall detection accuracy of the network. A series of ablation experiments were conducted on a specific dataset. Various classical detections were compared to verify the effectiveness and real-time performance of the detection. The Jetson Xavier NX platform was adopted to fully meet the transplantation requirements with high detection accuracy. The test results indicate that the improved model was achieved with a detection accuracy of 89.58%. The cattle face detection accuracy was improved by 7.34 percentage points, compared with the original YOLOv7-tiny network. The detection speed was 62 frames per second. Detection performance outperformed both the original and comparative network under the condition of almost no loss in detection speed, particularly for the captured images in complex real-world scenarios. This performance demonstrated excellent robustness and significant practical value with wide-ranging application prospects.

       

    /

    返回文章
    返回