Abstract:
Fish feeding behavior can provide effective decision-making information for accurate feeding in aquaculture. Most previous studies were usually conducted in a laboratory environment to understanding fish feeding behavior. The limited application cannot reveal the actual production status of fish due to the influence of light conditions and farming environment in practice. Particularly, the cameras placed over the water surface cannot work well in most methods, due to serious light reflection resulted from the complex illumination conditions. For instance, the light reflection is so serious that many fishes are blocked out. In this study, an attempt was made to introduce an underwater video dataset for the feeding behavior of Atlantic salmon. In the dataset, the video clips were captured from an industrial recirculating aquaculture system. Each sample that labeled as eating or noneating was a 5-second clip with the frame rate of 30 Hz. A total of 3 791 samples were marked in the dataset, where 3 132 samples were marked as noneating and 659 samples eating. A novel video classification method based on Variational Auto-Encoder and Convolutional Neural Network (VAE-CNN) was proposed to identify the fish-feeding behavior from the video clip. Two steps were as followed. In the first step, a Variational Auto-Encoder (VAE) model was trained to extract the spatial feature of video frames. All video frames were encoded as a multivariate Gaussian probability distribution function in a latent space, indicating that represented by a Gaussian mean vector and a Gaussian variance vector. Specifically, the frames in a video clip were input into a trained VAE encoder to produce Gaussian mean vectors and Gaussian variance vectors, then to combine them in column order separately, finally to obtain the Gaussian mean feature matrix and Gaussian variance feature matrix of the video. In this step, the video clip of fish feeding behavior was coded as a feature map with two channels for the subsequent classification. In the second step, the fish feeding behavior was classified by inputting the feature matrix into the CNN. The VAE output features were input to train the CNN, while the spatio-temporal features in fish feeding behavior videos were extracted for the final classification. To verify the CNN, the VAE output features were also input into the backpropagation neural network (VAE-BP) and support vector machine (VAE-SVM) to classify the feeding behavior of fish. The results showed that VAE-CNN performed better. The main reason is that the CNN with a local receptive field function can allow it to better learn the spatio-temporal features in fish feeding behavior videos, while the other two methods only consider the output features of VAE as a common feature map. In real factory farming, the accuracy of the proposed method reached 89%, the recall reached 90%, and the specificity reached 87%. Compared with the single-image classification method, VAE-CNN recall increased by 15 percentage points, and other performance indexes of video classification method improved significantly. In terms of running time, the proposed algorithm only needed 4.15 s to process 5 s (150 frames) for the video of fish feeding behavior. This novel method can build a solid foundation for the future system with feedback control based on the fish feeding behavior.