李辉, 王晓宇, 刘云, 陶冶, 付诗佳, 吴依凡. 融合多尺度特征和多重注意力的水下目标检测[J]. 农业工程学报, 2022, 38(20): 129-139. DOI: 10.11975/j.issn.1002-6819.2022.20.015
    引用本文: 李辉, 王晓宇, 刘云, 陶冶, 付诗佳, 吴依凡. 融合多尺度特征和多重注意力的水下目标检测[J]. 农业工程学报, 2022, 38(20): 129-139. DOI: 10.11975/j.issn.1002-6819.2022.20.015
    Li Hui, Wang Xiaoyu, Liu Yun, Tao Ye, Fu Shijia, Wu Yifan. Detecting underwater objects using multi-scale features fusion and multiple attention[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(20): 129-139. DOI: 10.11975/j.issn.1002-6819.2022.20.015
    Citation: Li Hui, Wang Xiaoyu, Liu Yun, Tao Ye, Fu Shijia, Wu Yifan. Detecting underwater objects using multi-scale features fusion and multiple attention[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(20): 129-139. DOI: 10.11975/j.issn.1002-6819.2022.20.015

    融合多尺度特征和多重注意力的水下目标检测

    Detecting underwater objects using multi-scale features fusion and multiple attention

    • 摘要: 探明海洋生物资源的分布情况,对渔业捕捞和海洋牧场管理具有重要意义。该研究针对水下环境复杂、水下目标存在多尺度、多类别及小目标较多等复杂情况,提出水下目标两阶段网络检测方法。首先通过改进多尺度特征提取和融合,获取水下目标多尺度信息和增强目标特征,得到更加丰富的目标特征信息,然后构建多重注意力,利用空间和通道维度中的全局特征依赖关系,进一步挖掘深层特征信息和隐藏信息,突出背景和目标的差异性,最后在模型训练中采用样本均衡方法,自适应均衡正负样本比例,减少无效样本,实现模型快速收敛。在国际水下机器人大赛公开数据集UPRC2019、WildFish及自建数据集上对所提方法进行试验,其mAP(mean Average Precision)分别达到85.3%、96.9%和97.8%,召回率分别达到90.6%、98.7%和98.9%,相较于Libra RCNN(CVPR2019)、Double head RCNN(ECCV2020)和STransFuse(2021)等检测方法,该文方法mAP要比上述方法分别高9.58、12.2和4.1个百分点。研究结果可为海洋渔业生物监测、水下机器人精准捕捞作业提供技术支撑。

       

      Abstract: Abstract: The distribution of biological resource is of great significance in fisheries and marine ranching. The underwater robots can be expected to combine with the underwater object detection, due to the cost saving and the less risk of fishing operations, compared with the commonly-used manual detection. However, many difficulties and challenges are still remained under the complex and special underwater conditions susceptible to environmental interference, particularly for the underwater objects with the multiple scales, types, and small size. Most of the existing detection for the objects on the ground cannot fully meet the requirement of underwater marine objects, leading to the false and missing detection. Therefore, it is necessary to redesign the network structure suitable for the underwater objects. In this study, a two-stage network detection was proposed to integrate the multi-scale features and multiple attention networks for the underwater objects. Firstly, the multi-scale feature fusion was enhanced to expand the high-level receptive field for the more feature information of objects in the high-level feature maps using hybrid dilated convolution. As such, the network was better adapted to the multi-scale changes without losing the small objects. Then, the up sampling was performed on the effective jump fusion for the high-level channel information without the loss, in order to fully extract the object features. Secondly, a multi-attention network was constructed for the more spatial location and channel features, in view of the complex underwater background, blurred images, small size, and insignificant differences of underwater objects. The global feature dependencies in the space and channel dimensions were more fully utilized to further excavate the hidden feature of difficult samples, with emphasis on the location and feature information of the objects. Finally, the sample equalization was adopted to adaptively balance the proportion of positive and negative samples in training. The fast convergence and optimal training were achieved in the quality of samples for the less calculation of invalid samples. The data sources were selected as the open source UPRC2019, WildFish, and self-built UT datasets from the International Underwater Robot Competition. A series of ablation tests were then conducted on the three datasets to compare with the advanced detection. The results show that the mean Average Precision (mAP) of the improved model on the three datasets reached 85.3%, 96.9%, and 97.8%, respectively. On the basis of the benchmark network, the recall rate increased by 12.2, 8, and 6.6 percentage points, respectively, while the recall rate reached 90.6%, 98.7%, and 98.9%, respectively. Furthermore, the mAPs of the improved model were 9.58, 12.2, and 4.1 percentage points higher than those of the mainstream deep learning-based detection, such as Libra RCNN (CVPR2019), Double head RCNN (ECCV2020), and STransFuse (2021), respectively. In addition, the detection time of a single underwater image was 0.57 s, fully meeting the real-time detection requirements of underwater objects. The findings can provide the technical support to monitor the biological objects for the high precision fishing operation of underwater robots in marine fishery. More importantly, the improved network can be modified to reduce the number of parameters without losing accuracy, particularly for the high detection speed and accuracy under the large-scale amount of data in practice.

       

    /

    返回文章
    返回