基于水质-声音-视觉融合的循环水养殖鱼类摄食强度识别

    Identification of feeding intensity in recirculating aquaculture fish using water quality-sound-vision fusion

    • 摘要: 在工厂化循环水养殖中,准确识别鱼类摄食强度是实现精准投喂的前提和关键。水质、视觉、声音等单模态数据均可用于评估摄食强度,但单一模态往往具有片面性,难以完全反映全局特征,存在识别精度低、可移植性差等问题。多模态方法通过融合不同模态的特征,可为摄食强度量化提供新的手段。基于此,为融合鱼类摄食中的“水质-声音-视觉”信息,实现高精度的鱼类摄食强度量化,该研究在多模态Transformer(multimodal transformer,MulT)的基础上,提出一种多模态融合的鱼类摄食强度识别算法Fish-MulT。首先,从输入的水质、声音和视觉数据中提取特征向量;其次,利用多模态转移模块(multimodal transfer module,MMTM)对输入的特征向量进行融合,得到3种融合向量;然后对融合向量添加自适应权重并相加,得到融合模态;最后,利用融合模态将MulT算法中各模态分支的跨模态Transformer(cross-modal transformer)从2个优化为1个。试验结果表明,与MulT算法相比,该研究算法的鱼类摄食强度识别准确率由93.30%提高到95.36%,参数量减少38%。与水质、声音和视觉单模态相比,准确率分别提高68.56、21.65和3.61个百分点。可用于制定精准投喂策略,并为开发智能投喂系统提供技术支持。

       

      Abstract: An accurate and rapid identification of fish feeding intensity can be essential to implement the precise feeding strategies in an industrial recirculating aquaculture system. Feeding intensity can be assessed using single modality indicators, such as water quality, vision, and sound. However, these individual data sources can often provide limited information without the complete picture of feeding behavior. The multimodal data can be expected to incorporate more comprehensive insights, compared with the single modality. Consequently, a more accurate representation of fish feeding intensity can be obtained to reduce some limitations of single modality information. Multi-modal fusion can also offer a novel approach to quantifying the feeding intensity. In this study, a Fish-MulT algorithm was proposed to combine the water quality, sound, and vision data using a Multimodal Transformer algorithm (MulT). Firstly, the feature vectors were extracted from the input water quality, sound, and visual data. Secondly, a Multimodal Transfer Module (MMTM) was employed to fuse the input feature vectors. Adaptive weights were then assigned to the three modalities after fusion. Lastly, the cross-modal transformer of each modal branch in the MulT algorithm was optimized to reduce the number of required transformers from 2 to 1. Furthermore, 1293 sets of three-modality data were collected in the experiment, with 70% used as the training set, 15% as the validation set, and the remaining 15% as the test set. The dataset was classified into four categories, according to the feeding intensity: "strong", "medium", "weak", and "none". The improved model was then evaluated using the collected dataset. Experimental results demonstrate that the Fish-MulT algorithm improved the accuracy of the fish feeding intensity identification from 93.30% to 95.36%, while reducing the number of parameters by 38%, compared with the MulT. Moreover, there were significant improvements in the accuracy, compared with the water quality, sound, and vision data, with the increases of 68.56, 21.65, and 3.61 percentage points, respectively. An ablation test of the added MMTM module and adaptive weight was also conducted during this time. The addition of both the MMTM module and adaptive weight led to the most significant improvement in the accuracy (up to 95.36%). A substantial improvement was achieved in the recognition accuracy, compared with the single modality. As a result, the Fish-MulT algorithm can be employed to develop precise feeding strategies. The finding can provide technical support to intelligent feeding systems.

       

    /

    返回文章
    返回