基于多注意力机制级联LSTM模型的猪脸表情识别

    Pig facial expression recognition using multi-attention cascaded LSTM model

    • 摘要: 面部表情是传递情感的重要信息,是家畜生理、心理和行为的综合反映,可以用于评估家畜福利。由于家畜面部肌群结构简单,因此家畜面部不同区域的细微变化对于表情的反映较难识别。该研究提出一种基于多注意力机制级联LSTM框架模型(Multi-attention Cascaded Long Short Term Memory,MA-LSTM)对家猪时序面部表情进行分类识别。首先通过简化的多任务级联卷积结构实现帧图像中猪脸的快速检测与定位,去除非猪脸区域对于识别性能的影响。其次提出一种多注意力机制模块,利用不同特征通道视觉信息不同相应峰值响应区域也不同这一特性,通过对峰值响应相近区域进行聚类捕获表情变化引起的面部显著性区域,实现对面部细微变化的关注。在自标注构建的家猪表情数据集上的试验结果表明,该研究提出的多注意力机制级联LSTM模型在4类表情的平均识别准确率为91.826%,对比关闭多注意力机制模块平均识别准确率平均提升6.3个百分点,同时误分率也有较为明显的降低。对比其他常用面部表情识别算法LBP-TOP、HOG-TOP、ELRCN、STC-NLSTM,MA-LSTM模型平均识别精度分别提升约32.6、18.0、5.9和4.4个百分点。试验结果验证了该研究提出的多注意力机制级联LSTM模型在猪脸表情识别的有效性。

       

      Abstract: Abstract: Facial expression recognition has widely been used in various life scenarios, such as medicine, criminology, education, and deep learning. Deep learning also makes this technology highly efficient and accurate at present. Much effort has been made to consider the migration of relatively mature facial recognition to animal expressions. The reason was that animals can also express their emotions through facial expressions, according to zoologists. Once the complex emotions expressed by animals can be understood, the incidence of injuries and illnesses can be early monitored in the freedom of animal expressions, thereby maintaining a happy mood for a long time, without hunger, thirst, and worries in a fully guaranteed life. As such, facial expressions can be expected to evaluate animal welfare, due mainly to a comprehensive reflection of physiology, psychology, and behavior of livestock. However, it is difficult to recognize the subtle changes in different areas of facial expressions, particularly for the simple tissue structure of facial muscles in domestic animals. In this study, a Multi-Attention cascaded Long Short Term Memory (MA-LSTM) model was proposed for the recognition of pig facial expression. The specific procedure was as follows: firstly, a simplified multi-task convolution neural network (SMTCNN) was used to detect and then locate the pig face in the frame image, where the influence of the non-pig face region on the recognition performance was removed. Secondly, a multi-attention mechanism was introduced to characterize various feature channels with different visual information and peak response regions. The facial salient regions caused by the changes of facial expression were captured via clustering the regions with similar peak responses. Then the facial salient regions were used to focus on subtle changes in the pig face. Finally, the convolution and attention features were fused and subsequently input into LSTM to classify the data. Data enhancement was performed on the original dataset, thereby obtaining a self-annotated expression dataset of domestic pigs. The expanded datasets were then utilized in the experiments. The experimental results showed that the recognition accuracy of the module with closing the multi-attention mechanism increased by 6.3 percentage points on average, while the misclassification rate was also reduced significantly, compared with the MA-LSTM model. Additionally, the average recognition accuracy of the MA-LSTM model increased by about 32.6, 18.0, 5.9, and 4.4 percentage points, respectively, compared with commonly-used facial video expression recognition. Four types of expressions were classified in visualization, such as anger, happiness, fear, and neutral. Specifically, there was a more obvious variation in the facial area of domestic pigs that was caused by anger and happiness, where the recognition accuracy was higher than others. Nevertheless, the misclassification rate was also higher, due mainly to the fact that the changes of two areas were relatively similar. In any way, the proposed MA-LSTM model was also verified by all the test data in pig face recognition.

       

    /

    返回文章
    返回