Abstract:
The cephalothorax morphology of the Chinese mitten crab (Eriocheir sinensis) exhibits notable variations among individuals of the same species, making it an important basis for origin traceability and individual identification. Accurate localization of key landmarks on the carapace is a critical prerequisite for quantitative phenotype analysis, precise individual recognition, and related automated processing tasks. Traditional approaches rely predominantly on manual visual assessment and expert judgment, which are time-consuming, labor-intensive, and prone to inconsistency, making them unsuitable for large-scale and automated applications in modern aquaculture. To address these challenges, this study proposes an automated, high-precision keypoint detection framework, named YOLO-FMC-pose, specifically designed for the carapace of Chinese mitten crab. A comprehensive dataset was constructed, containing high-resolution images of crabs from multiple geographic origins, including Liangzi Lake, Junshan Lake, and Yangcheng Lake. Thirty-five representative anatomical landmarks on the carapace were carefully selected and manually annotated to ensure biological interpretability and structural completeness. To improve model robustness and generalization, various data augmentation strategies were applied, including random rotation, scaling, brightness and contrast adjustments, and horizontal flipping, simulating diverse real-world imaging conditions. The proposed YOLO-FMC-pose model builds upon the lightweight YOLO11n-pose backbone and incorporates three core improvements aimed at enhancing frequency sensitivity, multi-scale semantic integration, and attention-guided spatial representation. First, a C3K2FD module, integrating Frequency Dynamic Convolution (FDConv), was introduced to capture rich frequency-dependent features, allowing the model to simultaneously respond to high-frequency edge details and low-frequency smooth textures present in the carapace. Second, a Mixed Aggregation Network (MANet) was incorporated in the Neck stage to aggregate multi-scale features and enhance contextual understanding, improving the model's ability to distinguish subtle structural differences among landmarks. Third, the Convolutional Block Attention Module (CBAM) was integrated in the detection head, employing both channel and spatial attention mechanisms to emphasize informative regions while suppressing irrelevant background noise. These three modules function synergistically to enhance the model’s capability to accurately capture the spatial arrangement and fine-grained structure of critical landmarks. Extensive experiments were conducted to evaluate the performance of YOLO-FMC-pose against several state-of-the-art lightweight keypoint detection models, including YOLOv8n-pose, YOLOv10n-pose, YOLOv12n-pose, and the original YOLO11n-pose. The results demonstrated that YOLO-FMC-pose achieved superior performance across multiple metrics. Specifically, it attained a precision of 97.98%, a recall of 97.00%, a mAP0.5 of 98.27%, and a mAP0.5:0.95 of 73.28%. Compared with the original YOLO11n-pose, these values represent absolute improvements of 3.33%, 2.33%, 2.94%, and 13.08%, respectively. The normalized mean error (NME) of predicted keypoints was reduced to 3.835%, indicating highly accurate spatial correspondence between predicted and ground-truth landmarks. Despite the model's enhanced capability, the detection time remained at 7.5 milliseconds per image, confirming its feasibility for real-time deployment in aquaculture processing and quality control pipelines. Visualizations of attention heatmaps further revealed that YOLO-FMC-pose consistently focuses on structurally significant regions of the carapace, including edges, protrusions, and concavities, regardless of imaging device or lighting conditions. This demonstrates the model’s robustness and reliability for identifying critical anatomical features across diverse acquisition settings. By providing precise and automated keypoint detection, YOLO-FMC-pose establishes a strong foundation for downstream applications such as individual crab identification, geographic origin verification, anti-counterfeiting labeling, and traceability systems. In summary, this study presents a novel and effective approach for fine-grained phenotypic feature extraction in Chinese mitten crab, integrating multi-module deep learning strategies to achieve high accuracy, robustness, and efficiency. The proposed method not only advances the state-of-the-art in automated crab carapace landmark detection but also provides a scalable framework for intelligent aquaculture management and aquatic product traceability. Future work will focus on expanding dataset diversity under varying environmental conditions, deploying the model on edge and embedded devices for real-time applications, and integrating keypoint detection with multi-dimensional phenotypic analysis for comprehensive individual identification and quality assessment systems.