基于YOLO-FMC-pose的中华绒螯蟹头胸甲关键点检测方法

张哲; 于合龙; 杨信廷; 罗娜; 李珊珊; 孙传恒

doi:10.11975/j.issn.1002-6819.2025071831

基于YOLO-FMC-pose的中华绒螯蟹头胸甲关键点检测方法

Keypoint Detection Method for the Carapace of Chinese Mitten Crab Based on YOLO-FMC-pose

摘要

摘要: 中华绒螯蟹（Eriocheir sinensis）的头胸甲形态在同一物种的不同个体之间表现出明显差异，这一特征可作为产地溯源和个体识别的重要依据。其中，头胸甲关键点的精准检测是实现个体识别与表型分析等任务的基础环节。然而，传统的人工检测方法依赖经验性判断，存在效率低、重复性差等问题，难以满足规模化水产处理的实际需求。为此，该研究提出了一种基于YOLO-FMC-pose的中华绒螯蟹头胸甲关键点自动检测方法，以实现高精度、自动化的特征提取。首先，构建了一个包含大量中华绒螯蟹头胸甲图像的自建数据集，并选取具有代表性的35个地标关键点进行精确标注，同时通过数据增强提升模型的训练效果。其次，该研究基于改进的YOLO11n-pose框架设计了中华绒螯蟹头胸甲关键点检测模型YOLO-FMC-pose。模型中引入了融合频率动态卷积（FDConv）的C3K2FD模块、混合聚合网络（MANet）模块以及CBAM注意力机制，从频域响应、特征融合与空间关注等层面对结构进行了优化。结果表明，所提出的YOLO-FMC-pose模型在关键点检测精度方面均优于现有主流方法，准确率、召回率、mAP_0.5和mAP_0.5:0.95分别为97.98%、97.00%、98.27%和73.28%，相较于原始YOLO11n-pose，准确率、召回率、mAP_0.5和mAP_0.5:0.95分别提高了3.33、2.33、2.94和13.08个百分点，标准化平均误差（normalized mean error, NME）降低至3.835%，检测时间为7.5ms，具备良好的实际应用潜力。该研究为中华绒螯蟹的个体智能识别、产地溯源与防伪管控提供了关键技术支撑，也为水产品精细化特征检测提供了路径。

Abstract: The cephalothorax morphology of the Chinese mitten crab (Eriocheir sinensis) exhibits notable variations among individuals of the same species, making it an important basis for origin traceability and individual identification. Accurate localization of key landmarks on the carapace is a critical prerequisite for quantitative phenotype analysis, precise individual recognition, and related automated processing tasks. Traditional approaches rely predominantly on manual visual assessment and expert judgment, which are time-consuming, labor-intensive, and prone to inconsistency, making them unsuitable for large-scale and automated applications in modern aquaculture. To address these challenges, this study proposes an automated, high-precision keypoint detection framework, named YOLO-FMC-pose, specifically designed for the carapace of Chinese mitten crab. A comprehensive dataset was constructed, containing high-resolution images of crabs from multiple geographic origins, including Liangzi Lake, Junshan Lake, and Yangcheng Lake. Thirty-five representative anatomical landmarks on the carapace were carefully selected and manually annotated to ensure biological interpretability and structural completeness. To improve model robustness and generalization, various data augmentation strategies were applied, including random rotation, scaling, brightness and contrast adjustments, and horizontal flipping, simulating diverse real-world imaging conditions. The proposed YOLO-FMC-pose model builds upon the lightweight YOLO11n-pose backbone and incorporates three core improvements aimed at enhancing frequency sensitivity, multi-scale semantic integration, and attention-guided spatial representation. First, a C3K2FD module, integrating Frequency Dynamic Convolution (FDConv), was introduced to capture rich frequency-dependent features, allowing the model to simultaneously respond to high-frequency edge details and low-frequency smooth textures present in the carapace. Second, a Mixed Aggregation Network (MANet) was incorporated in the Neck stage to aggregate multi-scale features and enhance contextual understanding, improving the model's ability to distinguish subtle structural differences among landmarks. Third, the Convolutional Block Attention Module (CBAM) was integrated in the detection head, employing both channel and spatial attention mechanisms to emphasize informative regions while suppressing irrelevant background noise. These three modules function synergistically to enhance the model’s capability to accurately capture the spatial arrangement and fine-grained structure of critical landmarks. Extensive experiments were conducted to evaluate the performance of YOLO-FMC-pose against several state-of-the-art lightweight keypoint detection models, including YOLOv8n-pose, YOLOv10n-pose, YOLOv12n-pose, and the original YOLO11n-pose. The results demonstrated that YOLO-FMC-pose achieved superior performance across multiple metrics. Specifically, it attained a precision of 97.98%, a recall of 97.00%, a mAP0.5 of 98.27%, and a mAP0.5:0.95 of 73.28%. Compared with the original YOLO11n-pose, these values represent absolute improvements of 3.33%, 2.33%, 2.94%, and 13.08%, respectively. The normalized mean error (NME) of predicted keypoints was reduced to 3.835%, indicating highly accurate spatial correspondence between predicted and ground-truth landmarks. Despite the model's enhanced capability, the detection time remained at 7.5 milliseconds per image, confirming its feasibility for real-time deployment in aquaculture processing and quality control pipelines. Visualizations of attention heatmaps further revealed that YOLO-FMC-pose consistently focuses on structurally significant regions of the carapace, including edges, protrusions, and concavities, regardless of imaging device or lighting conditions. This demonstrates the model’s robustness and reliability for identifying critical anatomical features across diverse acquisition settings. By providing precise and automated keypoint detection, YOLO-FMC-pose establishes a strong foundation for downstream applications such as individual crab identification, geographic origin verification, anti-counterfeiting labeling, and traceability systems. In summary, this study presents a novel and effective approach for fine-grained phenotypic feature extraction in Chinese mitten crab, integrating multi-module deep learning strategies to achieve high accuracy, robustness, and efficiency. The proposed method not only advances the state-of-the-art in automated crab carapace landmark detection but also provides a scalable framework for intelligent aquaculture management and aquatic product traceability. Future work will focus on expanding dataset diversity under varying environmental conditions, deploying the model on edge and embedded devices for real-time applications, and integrating keypoint detection with multi-dimensional phenotypic analysis for comprehensive individual identification and quality assessment systems.

HTML全文

参考文献(37)

施引文献

资源附件(0)