苍岩, 罗顺元, 乔玉龙. 基于深层神经网络的猪声音分类[J]. 农业工程学报, 2020, 36(9): 195-204. DOI: 10.11975/j.issn.1002-6819.2020.09.022
    引用本文: 苍岩, 罗顺元, 乔玉龙. 基于深层神经网络的猪声音分类[J]. 农业工程学报, 2020, 36(9): 195-204. DOI: 10.11975/j.issn.1002-6819.2020.09.022
    Cang Yan, Luo Shunyuan, Qiao Yulong. Classification of pig sounds based on deep neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(9): 195-204. DOI: 10.11975/j.issn.1002-6819.2020.09.022
    Citation: Cang Yan, Luo Shunyuan, Qiao Yulong. Classification of pig sounds based on deep neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(9): 195-204. DOI: 10.11975/j.issn.1002-6819.2020.09.022

    基于深层神经网络的猪声音分类

    Classification of pig sounds based on deep neural network

    • 摘要: 猪的声音能够反映生猪的应激状态以及健康状况,同时声音信号也是最容易通过非接触方式采集到的生物特征之一。深层神经网络在图像分类研究中显示了巨大优势。谱图作为一种可视化声音时频特征显示方式,结合深层神经网络分类模型,可以提高声音信号分类的精度。现场采集不同状态的猪只声音,研究适用于深层神经网络结构的最优谱图生成方法,构建了猪只声音谱图的数据集,利用MobileNetV2网络对3种状态猪只声音进行分类识别。通过分析对比不同谱图参数以及网络宽度因子和分辨率因子,得出适用于猪只声音分类的最优模型。识别精度方面,通过与支持向量机,随机森林,梯度提升决策树、极端随机树4种模型进行对比,验证了算法的有效性,异常声音分类识别精度达到97.3%。该研究表明,猪只的异常发声与其异常行为相关,因此,对猪只的声音进行识别有助于对其进行行为监测,对建设现代化猪场具有重要意思。

       

      Abstract: Abstract: Pig sounds reflect the stress and health status of pigs, also it is the most easily collected biomarker through non-contact methods. To improve the classification accuracy of pig sound signals, this study used the spectrogram to visualize the time-frequency characteristics, and combined with the deep neural network classification model. Four contents were discussed as followed: 1) The sound data set was constructed. According to the different sound signals, the pig's behavior could be recognized by the classification network. When the pig was in normal statuses, the pig sounds were called as grunts. If the pig was in frightened statuses, such as injected or chased, pig sounds were defined as screams. Before the feeding, when pigs see the food, pigs made long irritable sounds. The sounds were called as howls of hunger. All pig sounds were collected on-farm by the sound collection box. On the farm, a laptop was used as a host computer to display all the working parameters of the collection box. The data transmission and storage scheme adopted the Client/Server architecture. Besides, the worker labeled sounds, according to the behavior. 2) Spectrograms of different sounds built up the training and test dataset of the image recognition network. The pig sound was a stationary signal in short time duration, therefore, continuously calculating the frequency spectrum of the sound signal in the vicinity of the selected instant of time gave rise to a time-frequency spectrum. The study discussed the optimal spectrogram parameters, which were suitable for the structure of the deep neural network. Experiment results showed that the segment length of the pig sounds was 256 samples and the overlap was 128 samples, the classification accuracy of the deep neural network was highest. The spectrogram optimization experiment results showed that the recognition accuracy was improved by 1.8%. 3) The deep neural network was designed. The study used the MobileNetV2 network to achieve recognition, which was based on an inverted residual structure where the shortcut connections were between the thin bottleneck layers. Aiming to the portable platform in the real application, the width factor and the resolution factor were introduced to define a smaller and more efficient architecture. Also, Adam optimizer formed an adequate substitute for the underlying RMSprop optimizer, and it made the loss function convergent faster. Adam optimizer calculated the adaptive parameter-learning rate based on the mean value of the first moment, making full use of the mean value of the second moment of the gradient. The result implied the width factor was chosen as 0.5, the accuracy was highest. 4) Compared experiments had been done. Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), Random Forest (RF), and Extra Trees (ET) algorithms were compared with the proposed pig sound recognition network. All algorithms were trained and tested on the same sound dataset. Specifically, the proposed algorithm increased the recognition accuracy of screams from 84.5% to 97.1%, and the accuracy of howls was increased from 86.1% to 97.5%. But the recognition accuracy of grunts was decreased from 100% to 97.3%. This was caused by the difference in the principle of different recognition algorithms. Furthermore, through the experiments on the width factor and resolution factor, a smaller and more efficient model was defined based on the standard MobileNetV2 model, and the running speed of the model was significantly improved to meet the needs of practical applications, however, the accuracy remained. This study showed that the abnormal pig vocalization was related to abnormal behavior, so sound recognition could help to monitor behaviors. In the future, the abnormal behaviors combined the sound recognition and video analysis would be discussed.

       

    /

    返回文章
    返回