Abstract
Abstract: The distribution of biological resource is of great significance in fisheries and marine ranching. The underwater robots can be expected to combine with the underwater object detection, due to the cost saving and the less risk of fishing operations, compared with the commonly-used manual detection. However, many difficulties and challenges are still remained under the complex and special underwater conditions susceptible to environmental interference, particularly for the underwater objects with the multiple scales, types, and small size. Most of the existing detection for the objects on the ground cannot fully meet the requirement of underwater marine objects, leading to the false and missing detection. Therefore, it is necessary to redesign the network structure suitable for the underwater objects. In this study, a two-stage network detection was proposed to integrate the multi-scale features and multiple attention networks for the underwater objects. Firstly, the multi-scale feature fusion was enhanced to expand the high-level receptive field for the more feature information of objects in the high-level feature maps using hybrid dilated convolution. As such, the network was better adapted to the multi-scale changes without losing the small objects. Then, the up sampling was performed on the effective jump fusion for the high-level channel information without the loss, in order to fully extract the object features. Secondly, a multi-attention network was constructed for the more spatial location and channel features, in view of the complex underwater background, blurred images, small size, and insignificant differences of underwater objects. The global feature dependencies in the space and channel dimensions were more fully utilized to further excavate the hidden feature of difficult samples, with emphasis on the location and feature information of the objects. Finally, the sample equalization was adopted to adaptively balance the proportion of positive and negative samples in training. The fast convergence and optimal training were achieved in the quality of samples for the less calculation of invalid samples. The data sources were selected as the open source UPRC2019, WildFish, and self-built UT datasets from the International Underwater Robot Competition. A series of ablation tests were then conducted on the three datasets to compare with the advanced detection. The results show that the mean Average Precision (mAP) of the improved model on the three datasets reached 85.3%, 96.9%, and 97.8%, respectively. On the basis of the benchmark network, the recall rate increased by 12.2, 8, and 6.6 percentage points, respectively, while the recall rate reached 90.6%, 98.7%, and 98.9%, respectively. Furthermore, the mAPs of the improved model were 9.58, 12.2, and 4.1 percentage points higher than those of the mainstream deep learning-based detection, such as Libra RCNN (CVPR2019), Double head RCNN (ECCV2020), and STransFuse (2021), respectively. In addition, the detection time of a single underwater image was 0.57 s, fully meeting the real-time detection requirements of underwater objects. The findings can provide the technical support to monitor the biological objects for the high precision fishing operation of underwater robots in marine fishery. More importantly, the improved network can be modified to reduce the number of parameters without losing accuracy, particularly for the high detection speed and accuracy under the large-scale amount of data in practice.