基于连续语音识别技术的猪连续咳嗽声识别

    Pig continuous cough sound recognition based on continuous speech recognition technology

    • 摘要: 针对现有基于孤立词识别技术的猪咳嗽声识别存在识别声音种类有限,无法反映实际患病猪连续咳嗽的问题,该文提出了基于双向长短时记忆网络-连接时序分类模型(birectional long short-term memory-connectionist temporal classification, BLSTM-CTC)构建猪声音声学模型,进行猪场环境猪连续咳嗽声识别的方法,以此进行猪早期呼吸道疾病的预警和判断。研究了体质量为75 kg左右长白猪单个咳嗽声样本的持续时间长度和能量大小的时域特征,构建了声音样本持续时间在0.24~0.74 s和能量大于40.15 V2?s的阈值范围。在此阈值范围内,利用单参数双门限端点检测算法对基于多窗谱的心理声学语音增强算法处理后的30 h猪场声音进行检测,得到222段试验语料。将猪场环境下的声音分为猪咳嗽声和非猪咳嗽声,并以此作为声学模型建模单元,进行语料的标注。提取26维梅尔频率倒谱系数(Mel frequency cepstral coefficients,MFCC)作为试验语段特征参数。通过BLSTM网络学习猪连续声音的变化规律,并利用CTC实现了端到端的猪连续声音识别系统。5折交叉验证试验平均猪咳嗽声识别率达到92.40%,误识别率为3.55%,总识别率达到93.77%。同时,以数据集外1 h语料进行了算法应用测试,得到猪咳嗽声识别率为94.23%,误识别率为9.09%,总识别率为93.24%。表明基于连续语音识别技术的BLSTM-CTC猪咳嗽声识别模型是稳定可靠的。该研究可为生猪健康养殖过程中猪连续咳嗽声的识别和疾病判断提参考。

       

      Abstract: Abstract: Cough is one of the most frequent symptoms in the early stage of pig respiratory diseases. So it is possible to monitor and diagnose the diseases of pigs by detecting their coughs. The existing methods for pig cough recognition are based on key word recognition technology, which cannot recognize the samples that have not been trained or learned by itself, another drawback is that the methods are for isolated coughs while the coughs of sick pigs are usually continuous. This paper intends to realize the recognition of pig continuous cough sound based on continuous speech recognition technology. Ten Landrace pigs, with a body weight of about 75 kg, were used as sound collection objects, and pig sounds were collected in pig farms during late winter and early spring when the respiratory diseases of pigs were prevalent. The sound collection devices were working continuously all day. By selecting the frequent coughing phases in the collected signal, a total of 30 h pig farm sound signals were obtained as the experimental corpus. Firstly, the sound signals were denoised by the speech enhancement algorithm based on a psychoacoustical model. Then the time-domain characteristics, including duration and energy of individual cough, were studied, and it was found that the duration of pig cough ranged from 0.24 to 0.74 s and the energy ranged from 40.15 to 822.87 V2·s. So threshold of the sound samples was set with the duration and the lower energy value of individual coughs. Based on the threshold range, the speech endpoint detection algorithm based on short-time energy was used to detect the 30 h pig field sound signals which had been preprocessed by the speech enhancement algorithm, and 222 experimental sentences were obtained. The longest was 9.14 s and the shortest was 3.91 s. All 222 corpus contained a total of 1 145 sound samples, including 751 pig coughs and 394 non-pig coughs. Sounds in the pig farm environment, including cough, sneeze, eating, scream, hum, shaking ears sounds of pigs and sounds of dogs, metal clanging and some other background noise, were divided into pig cough and non-pig cough, which were chosen as the acoustic modeling units. The labels of the experimental sentences were obtained with the help of experts. Then the 13-dimensional Mel frequency cepstrum coefficients (MFCC) reflecting the static characteristics of pig sound were extracted, and the first-order differential coefficients reflecting the dynamic characteristics of pig sound were added to obtain the 26-dimensional MFCC, which were used as the characteristic parameter of the experimental sentence. Finally, the bidirectional Long Short-term Memory-Connectionist temporal classification(BLSTM-CTC) model was selected to recognize the pig continuous sounds, specifically, the BLSTM network had excellent feature learning ability of continuous pig sounds, and the CTC could directly model the alignment of the input continuous pig sound sequence and its labels. Through the 5-fold cross-validation experiment and analysis, the number of hidden layer neurons in the BLSTM forward propagation process, the backward propagation process, and the fully connected layer, were all set to 300, and the learning rate was set to 0.001. The average recognition rate, error recognition rate and total recognition rate of the results of 5 groups were 92.40%, 3.55% and 93.77%, respectively. Furthermore, the algorithm application test was carried out with another 1 h data, and the recognition rate reached to 94.23%, the error recognition rate was 9.09% with the total recognition rate of 93.24%. It is indicated that the pig cough sound recognition model based on continuous speech recognition technology is stable and reliable. This paper provides a reference for the recognition and disease judgment of pig continuous cough sound during the healthy breeding of pigs.

       

    /

    返回文章
    返回