林森, 刘美怡, 陶志勇. 采用注意力机制与改进YOLOv5的水下珍品检测[J]. 农业工程学报, 2021, 37(18): 307-314. DOI: 10.11975/j.issn.1002-6819.2021.18.035
    引用本文: 林森, 刘美怡, 陶志勇. 采用注意力机制与改进YOLOv5的水下珍品检测[J]. 农业工程学报, 2021, 37(18): 307-314. DOI: 10.11975/j.issn.1002-6819.2021.18.035
    Lin Sen, Liu Meiyi, Tao Zhiyong. Detection of underwater treasures using attention mechanism and improved YOLOv5[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(18): 307-314. DOI: 10.11975/j.issn.1002-6819.2021.18.035
    Citation: Lin Sen, Liu Meiyi, Tao Zhiyong. Detection of underwater treasures using attention mechanism and improved YOLOv5[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(18): 307-314. DOI: 10.11975/j.issn.1002-6819.2021.18.035

    采用注意力机制与改进YOLOv5的水下珍品检测

    Detection of underwater treasures using attention mechanism and improved YOLOv5

    • 摘要: 海胆、海参、扇贝等水下珍品在渔业中具有重要意义和价值,最近,利用机器人捕捞水下珍品成为发展趋势。为了探测水下珍品的数量及分布情况,使水下机器人获得更加可靠的数据,该研究提出基于注意力机制与改进YOLOv5的水下珍品检测方法。首先,使用K-means匹配新的锚点坐标,增加多个检测尺度提升检测精度;其次,将注意力机制模块融入特征提取网络Darknet-53中获得重要特征;然后,利用Ghost模块的轻量化技术优势,引入由Ghost模块构成的Ghost-BottleNeck代替YOLOv5中的BottleNeck模块,大幅度降低网络模型的参数与计算量;最后,将IOU_nms修改为DIOU_nms以优化损失函数。采用基于实际水下环境建立的数据集,样本数量为781幅图像,按照9∶1的比例随机划分训练与测试集,对改进的网络进行验证。结果表明,该研究算法可获得95.67%平均准确率,相比YOLOv5算法可提升5.49个百分点,试验效果良好,研究结果可以为水下珍品的检测捕捉提供更加准确快捷的方法。

       

      Abstract: Abstract: Underwater treasures, such as sea urchins, sea cucumbers, and scallops, have always been preferred in fish production, due mainly to the high value-added industry. However, two conventional approaches, including net fishing and manual catching, cannot meet the application requirements of rapid detection in the actual large-scale cultivation in modern agriculture, particularly on time-consuming, labor-intensive, and severe destruction of submarine environments in the early days. Alternatively, deep learning has widely been characterized by high resolution and fast speed in recent years. Therefore, it is a promising application potential to the target detection framework using the convolutional neural network in fishery production. It is also highly necessary to improve the detection performance in complex underwater environments. In this study, a YOLOv5 detection of underwater treasure was proposed using the attention mechanism, referred to as CG-YOLOv5, in order to provide a more accurate dataset for underwater robots. The main advantages were as follows: 1) DarkNet-53 was introduced the CBAM to deepen the network for the better performance of feature extraction, further to suppress the worthless features in the network. Specifically, the CBAM combined the channel and spatial attention to filter and weight the feature vectors. The channel attention focused mainly on what the detection target was, whereas, spatial attention was used to determine where the detection target was. As such, the prominent feature information was represented via two combined mechanisms, while weakening the general features. 2) The lightweight Ghost-Bottleneck module was introduced to replace the Bottleneck in YOLOv5. A simpler linear operation in Ghost-Bottleneck was utilized to maintain a higher accuracy with light weights. 3) New anchor points were obtained by clustering the labels of underwater datasets. A new detection scale was also added to the original three detections for higher detection accuracy. CG-YOLOv5 network mainly included CGDarknet-53 backbone network, Focus structure, Spatial Pyramid Pooling structure (SPP), and Path Aggregation Network (PANet). Focus served as a benchmark network with down sampling to change the input size of 640×640×3 to 320×320×32. Only one CSP structure was involved in the CG-YOLOv5 to integrate gradient changes completely into the feature map for feature fusion enhancement. The SPP structure was used to maximize the pooling of the feature layer. Four scales were utilized in the pooling layers with the pooling core sizes of 1×1, 5×5, 9×9, and 13×13, respectively. As such, the SPP effectively increased the perception field, while isolating significant contextual features. Furthermore, path aggregation networks were used to fuse different feature layers of an image. A specific dataset was also selected to verify the model using the actual underwater environment. There were 781 underwater images, 90% of which were employed as training datasets, and the rest were for testing. The experimental results demonstrated that the model fully met the requirement of detection and recognition for the treasures in complex underwater environments, compared with the current deep learning. The average accuracy was 95.67%. Compared with YOLOv5, the average precision of sea urchin, scallop and sea cucumber increased by 7.48, 6.90 and 2.09 percentage points, and mAP increased by 5.49 percentage points base point. Compared with other classical algorithms, the method has better accuracy and lower complexity. The finding can provide a more accurate and fast way to detect and capture aquatic products.

       

    /

    返回文章
    返回