Voice recognition of abnormal state of pigs based on improved CNN
-
-
Abstract
Abstract: Sound has been widely used to monitor the health and body conditions of pigs. But the manual monitoring cannot meet the high demand in modern agriculture at present, including zoonotic diseases, misjudgments of pig diseases, and time- and labor-consuming. In this study, a real-time collection module of pig sound was designed to rapidly recognize the abnormal state using an improved convolutional neural network (CNN). A 4G communication was used to upload the collected pig sound into the cloud server. A TCP/IP communication protocol was also selected, where the acquisition end was set as a TCP client and the uninterrupted data to the server. Specifically, the TCP cloud server was utilized to block the specified port, and then start the transfer data after the client was connected successfully. The server also sent a restart command to the client, to ensure data alignment. The sound acquisition was realized via a single channel, where the sampling frequency was 32 kHz, while the quantization digit was 16 bits. Correspondingly, the raw data of various abnormal sounds of pigs (sickness, fighting, and Hunger) were collected, according to the experts of pig breeding. Some operations were used to preprocess the data, including framing, windowing, de-nosing, and endpoint detection. As such, a voice data set of abnormal status was built. Subsequently, the Mel spectrogram of various sounds was extracted under the parameters of 128-dimensional mel frequency, 2048 points of Fast Fourier Transform (FFT) points, and 512 points of window shift. A classification model of the signal acquisition was then constructed using the feature of Mel spectrogram for pig sound signals. Therefore, a local feature learning unit was designed using an improved CNN, indicating fewer weights and lower network complexity than fully connected networks. Four layers of local feature units were constructed, where the number of convolution kernels in each layer was 64-64-128-128. Nevertheless, the local location and various redundant information were inevitably generated, when CNN had acquired each image. Three types of attention mechanisms were used to improve CNN, including Squeeze and Excitation Network (SE_NET), Efficient Channel Attention Networks, (ECA_NET), and Convolutional Block Attention Module (CBAM). A fully connected network with three neurons and an activation function of Softmax was also used to recognize abnormal sounds of pigs. The CBAM was then optimized to propose the CBAM-CNN using the ECA_NET improved SE_NET. The experimental results show that the optimal combination of parameters in pig voice recognition was 128 dimensional Mel frequency, 2048 point FFT, 1/4 window shift, and the optimal network model was _CBAM-CNN. The optimal recognition accuracy reached 94.46%, and the accuracy of pig squeal recognition reached 100%, better than before. The attention mechanism was also improved the model recognition, while reducing model complexity. A better recognition was achieved using the smaller size of _CBAM-CNN model, compared with CBAM-CNN. The accuracy of _CBAM-CNN model was 94.46% for the sound recognition of abnormal pigs. This finding can provide the accurate monitoring of abnormal behaviors of pigs in the process of breeding, thereby constructing intelligent and modern pig farms.
-
-