Abstract:
An accurate and rapid identification of fish feeding intensity can be essential to implement the precise feeding strategies in an industrial recirculating aquaculture system. Feeding intensity can be assessed using single modality indicators, such as water quality, vision, and sound. However, these individual data sources can often provide limited information without the complete picture of feeding behavior. The multimodal data can be expected to incorporate more comprehensive insights, compared with the single modality. Consequently, a more accurate representation of fish feeding intensity can be obtained to reduce some limitations of single modality information. Multi-modal fusion can also offer a novel approach to quantifying the feeding intensity. In this study, a Fish-MulT algorithm was proposed to combine the water quality, sound, and vision data using a Multimodal Transformer algorithm (MulT). Firstly, the feature vectors were extracted from the input water quality, sound, and visual data. Secondly, a Multimodal Transfer Module (MMTM) was employed to fuse the input feature vectors. Adaptive weights were then assigned to the three modalities after fusion. Lastly, the cross-modal transformer of each modal branch in the MulT algorithm was optimized to reduce the number of required transformers from 2 to 1. Furthermore, 1293 sets of three-modality data were collected in the experiment, with 70% used as the training set, 15% as the validation set, and the remaining 15% as the test set. The dataset was classified into four categories, according to the feeding intensity: "strong", "medium", "weak", and "none". The improved model was then evaluated using the collected dataset. Experimental results demonstrate that the Fish-MulT algorithm improved the accuracy of the fish feeding intensity identification from 93.30% to 95.36%, while reducing the number of parameters by 38%, compared with the MulT. Moreover, there were significant improvements in the accuracy, compared with the water quality, sound, and vision data, with the increases of 68.56, 21.65, and 3.61 percentage points, respectively. An ablation test of the added MMTM module and adaptive weight was also conducted during this time. The addition of both the MMTM module and adaptive weight led to the most significant improvement in the accuracy (up to 95.36%). A substantial improvement was achieved in the recognition accuracy, compared with the single modality. As a result, the Fish-MulT algorithm can be employed to develop precise feeding strategies. The finding can provide technical support to intelligent feeding systems.