基于YOLO-FMC-pose的中华绒螯蟹头胸甲关键点检测方法

    Keypoint detection method for the carapace of Chinese mitten crab based on YOLO-FMC-pose

    • 摘要: 中华绒螯蟹(Eriocheir sinensis)的头胸甲形态在同一物种的不同个体之间表现出明显差异,这一特征可作为产地溯源和个体识别的重要依据。其中,头胸甲关键点的精准检测是实现个体识别与表型分析等任务的基础环节。然而,传统的人工检测方法依赖经验性判断,存在效率低、重复性差等问题,难以满足规模化水产处理的实际需求。为此,该研究提出了一种基于YOLO-FMC-pose的中华绒螯蟹头胸甲关键点自动检测方法,以实现高精度、自动化的特征提取。首先,构建了一个包含大量中华绒螯蟹头胸甲图像的自建数据集,并选取具有代表性的35个地标关键点进行精确标注,同时通过数据增强提升模型的训练效果。其次,该研究基于改进的YOLO11n-pose框架设计了中华绒螯蟹头胸甲关键点检测模型YOLO-FMC-pose。模型中引入了融合频率动态卷积(FDConv)的C3K2FD模块、混合聚合网络(MANet)模块以及CBAM注意力机制,从频域响应、特征融合与空间关注等层面对结构进行了优化。结果表明,所提出的YOLO-FMC-pose模型在关键点检测精度方面均优于现有主流方法,准确率、召回率、mAP0.5和mAP0.5:0.95分别为97.98%、97.00%、98.27%和73.28%,相较于原始YOLO11n-pose,准确率、召回率、mAP0.5和mAP0.5:0.95分别提高了3.33、2.33、2.94和13.08个百分点,标准化平均误差(normalized mean error, NME)降低至3.835%,单帧图片推理时间为7.5 ms,具备良好的实际应用潜力。该研究为中华绒螯蟹的个体智能识别、产地溯源与防伪管控提供了关键技术支撑,也为水产品精细化特征检测提供了路径。

       

      Abstract: Chinese mitten crab (Eriocheir sinensis) is one of the most favorite seafoods in the Asian areas. The morphologies of its carapace exhibit pronounced intraspecific variation, making them highly informative phenotypic traits for individual identification, geographic origin tracing, and anticounterfeiting management in aquaculture and seafood quality control. Accurate localization of the key landmarks on the carapace is one of the most critical procedures for the quantitative phenotype analysis, precise individual recognition, and processing tasks. Conventional approaches can rely predominantly on the manual visual assessment and expert judgment, unsuitable for the large-scale and automated applications in modern aquaculture, due to the time-consuming, labor-intensive, and prone to inconsistency. In this study, a high-precision framework of the keypoint detection (named YOLO-FMC-pose) was specifically designed for the carapace of the Chinese mitten crab. A dataset was constructed with the high-resolution images of the crabs. Multiple geographic origins were selected from the Liangzi Lake, Junshan Lake, and Yangcheng Lake. Then, the 35 representative landmarks on the carapace were selected and manually annotated for the biological interpretability and structural completeness. Data augmentation was applied to improve the robustness and generalization of the model, including the random rotation, scaling, brightness and contrast adjustments, and horizontal flipping. Diverse real-world conditions were simulated during imaging. The YOLO-FMC-pose model was based on the lightweight YOLO11n-pose backbone. Three improvements were also incorporated to enhance the frequency sensitivity, multi-scale semantic integration, and attention-guided spatial representation. Firstly, a C3K2FD module was integrated with the Frequency Dynamic Convolution (FDConv). Rich frequency-dependent features were captured in response to the high-frequency edge details and low-frequency smooth textures in the carapace. Secondly, a Mixed Aggregation Network (MANet) was incorporated in the Neck stage. Multi-scale features were aggregated to distinguish the subtle structural differences among landmarks. Thirdly, the Convolutional Block Attention Module (CBAM) was integrated into the detection head. Both channel and spatial attention mechanisms were employed to emphasize the informative regions while suppressing irrelevant background noise. Three modules functioned synergistically to accurately capture the spatial arrangement and fine-grained structure of the critical landmarks. Extensive experiments were conducted to evaluate the performance of the YOLO-FMC-pose against several state-of-the-art lightweight detection models, including the YOLOv8n-pose, YOLOv10n-pose, YOLOv12n-pose, and the original YOLO11n-pose. The results demonstrated that the YOLO-FMC-pose achieved superior performance over multiple metrics. Specifically, the better performance was achieved in a precision of 97.98%, a recall of 97.00%, a mAP0.5 of 98.27%, and a mAP0.5:0.95 of 73.28%. Compared with the original YOLO11n-pose, these values represented the absolute improvements of 3.33, 2.33, 2.94, and 13.08 percentage point, respectively. The normalized mean error (NME) of the predicted keypoints was reduced to 3.835%, indicating the highly accurate spatial correspondence between predicted and ground-truth landmarks. The inference speed remained at 7.5 milliseconds per image, indicating its feasibility for real-time deployment in the aquaculture processing and quality control pipelines. Attention heatmaps revealed that the YOLO-FMC-pose was consistently focused on structurally significant regions of the carapace, including edges, protrusions, and concavities, whether the imaging device or lighting conditions. The high robustness and reliability of the model were obtained to identify the critical anatomical features during diverse acquisitions. The YOLO-FMC-pose was provided for the precise keypoint detection in the downstream applications, such as individual crab identification, geographic origin verification, anti-counterfeiting labeling, and traceability systems. In summary, an effective approach was presented to extract the fine-grained phenotypic features in the Chinese mitten crab. Multi-module deep learning strategies were integrated for high accuracy, robustness, and efficiency. Landmark detection of the crab carapace can provide a scalable framework for intelligent aquaculture and aquatic product traceability. Dataset diversity can be expanded under varying environmental conditions. The model was also deployed on the edge and embedded devices for real-time applications. Keypoint detection was integrated with the multi-dimensional phenotypic analysis for individual identification and quality assessment.

       

    /

    返回文章
    返回