基于光流法的鱼群摄食状态细粒度分类算法

唐宸; 徐立鸿; 刘世晶

doi:10.11975/j.issn.1002-6819.2021.09.027

摘要: 对鱼群摄食状态的细粒度分类有利于更精细地描述鱼群的摄食行为。该研究基于工厂化的循环养殖池环境提出了一种利用光流法进行特征提取的鱼群摄食状态细粒度分类算法。算法对鱼群的巡游视频进行摄食状态分类，首先通过光流法提取视频内鱼群的帧间运动特征，其次构建了一个帧间运动特征分类网络对该特征进行细粒度分类，最后基于投票策略确定视频的最终类别。试验结果表明，该研究算法在投票阈值设置为50%的情况下，视频准确率达98.7%；在投票阈值设置为80%的情况下，视频准确率为91.4%。在不同的投票阈值设置下，该算法的视频准确率始终保持在90%以上，说明其分类鲁棒性较强。相较于实验室养殖环境，基于工厂化养殖环境对鱼群的摄食状态所展开的算法研究实际应用性更强，可为精细描述鱼群摄食行为，实现精准投饵自动控制提供参考。

Abstract: To solve the fine-grained classification of fish feeding state in the factory production environment, the fine-grained classification of fish feeding state in the factory production environment is beneficial to describe the fish feeding behavior in more detail. While current studies are mostly based on an ideal laboratory environment where external disadvantages are ignored such as light conditions and image quality, these studies can't be applied in the factory environment. Moreover, these studies focus on the binary classification of fish feeding state (eating or non-eating), which is imprecise. This study carried out the fine-grained classification of fish feeding state where a small-scaled fine-grained classified fish feeding state dataset was collected. Videos used to make this dataset were all captured in the factory production environment. There was a total of 752 videos in the dataset, each video was 3 s (90 frames) and labeled as non-eating, weak-eating, or strong-eating. Based on this dataset, a fine-grained classification algorithm of fish feeding state was proposed to solve the fish feeding state classification problem in the factory production environment. Firstly, this algorithm solved optical flow fields according to all consecutive frames in videos and calculating the moving magnitude and angle of pixels according to optical flow fields solved before. After that, the magnitude and angle were divided into eight intervals separately, and the histograms of pixels' magnitude and angle were counted in these eight intervals. The spliced magnitude and angle histogram was represented as an inter-frame motion feature in frame level for further classification. In this process, the algorithm turned a video sample into many inter-frame motion feature samples by calculating optical flow fields of all consecutive frames in the video. Then a 5-layer (one input layer, one output layer, and three hidden layers) classification neural network was built to classify inter-frame motion features extracted before. The classification network had three output categories corresponding to three different feeding states (non-eating, weak-eating, and strong-eating) and was optimized by a cross-entropy loss function, the output category probability was calculated by Softmax classification function. All inter-frame motion feature classification predictions were considered in the final video classification through voting strategy. The most frequent predicted frame-level category in all frames was considered as the video's probable category, a voting threshold was additionally set to ensure the frequency of the prediction. When the predicted frequency of the probable category was greater than the voting threshold, the video sample could be predicted as the corresponding probable category. Otherwise, the video sample would be predicted as the uncertain category. The frequency of prediction was proportional to the voting threshold. By setting a high voting threshold, the algorithm could output more reliable classification results. The experiment results showed the video accuracy of the algorithm was 98.7% under the 50% voting threshold. When the voting threshold increased to 80%, the video accuracy remained at 91.4% which proved the robustness of the algorithm. The video accuracy decreased with the increase of the voting threshold because a higher voting threshold needed more corresponding frame-level predictions and more videos might be predicted as the uncertain category due to the low frequency of prediction. Some comparative experiments were conducted to prove the effectiveness of the proposed algorithm. The experiments of texture-based algorithm and single frame convolutional neural network showed single frame features were not able to solve the fine-grained feeding state classification problem, which also proved the effectiveness of inter-frame motion features calculated in the proposed algorithm. Besides, the proposed algorithm got good performances in the small-scaled dataset collected before due to the inter-frame motion features extracted by optical flow method, it transferred the training data from video level to frame level which increased training samples implicitly. This study concentrated on the commercial recirculating aquaculture system thus could be better applied in the factory production environment. Moreover, it realized the fine-grained classification of fish feeding state, which could help describe the fish feeding behavior in more detail.

基于光流法的鱼群摄食状态细粒度分类算法

Fine-grained classification algorithm of fish feeding state based on optical flow method