基于改进YOLOv11n算法的猪肉胴体切割定位研究与实现

    Research and implementation of pork carcass cutting and localization based on improved YOLOv11n algorithm

    • 摘要: 针对猪肉酮体切割位置定位自动化程度低,深度学习模型检测算法参数量大、精度低等问题,该研究提出基于YOLOv11n的猪肉酮体三段式切割位置轻量化检测方法。该PPD-YOLO算法主要由C3 LSKA、CIFNet和RFAHead这3个模块组成:三卷积可分离大核注意力(3-conv large separable kernel attention, C3 LSKA)通过大核卷积增强高层特征图语义信息,强化关键区域特征。交叉汇融网络(cross-stage interactive fusion network, CIFNet)通过多个轻量化残差模块降低模型参数量和计算复杂度。感受野注意力头(receptive field attention head, RFAHead)通过自适应卷积核感受野获取关键区域信息,提高模型检测精度。试验结果表明,与YOLOv11n算法相比,在猪肉酮体数据集上,PPD-YOLO算法在保持较高平均精度均值(mean average precision, mAP)的情况下,平均欧式距离均值(mean average Euclidean distance, mAED)减少9.2%,参数量减少0.67 M,计算复杂度降低0.2 G,优于主流的深度检测算法,研究结果可为猪肉加工行业智能化升级提供技术参考。

       

      Abstract: This research presents PPD-YOLO, an optimized deep learning framework designed to overcome critical limitations in automated pork carcass cutting position detection: excessive computational requirements and insufficient localization precision in industrial settings. Targeting the specific challenges of porcine anatomical segmentation—where traditional methods fail to accurately locate the 4th/5th rib interspace (anterior sectioning point) and sacral-lumbar junction (posterior sectioning point)—we establish a dedicated image dataset sourced from slaughterhouse production lines. After capturing 79 raw images of suspended half-carcasses under variable illumination (natural/artificial light interactions), rigorous preprocessing and data augmentation expanded the dataset to 1,264 annotated images, effectively mitigating overfitting risks for deep learning applications.Architecturally, PPD-YOLO enhances the YOLOv11n baseline through three synergistic innovations. First, the 3-Convolution Large Separable Kernel Attention (C3LSKA) module replaces standard backbone attention mechanisms. By decomposing conventional 2D convolutions into horizontal/vertical depthwise separable operations with 7×7 kernels (initial receptive field=7, dilation rate=2), this design expands the effective receptive field while reducing parameters. The module further implements channel splitting: 50% of features undergo LSKA processing to capture global rib-interstice semantics, while parallel 1×1 convolutions preserve local rib-edge details. Feature recombination via channel concatenation enhances cutting-region representation, reducing mAED by 1.1% in ablation studies.Second, the Cross-Stage Interactive Fusion Network (CIFNet) reconstructs the neck network using lightweight residual principles. Four-stage hierarchical processing integrates multi-scale backbone features (S3, S4, S5): 1) S5 upsampling and concatenation with reparameterized S4 features, processed through a CSP module to generate D4; 2) D4 upsampling and fusion with S3 yielding P3; 3) P3 downsampling and merging with D4 to form P4; 4) P4 downsampling integrated with S5 producing P5. Each stage employs CSP modules with dual-branch processing—preserving spatial distributions via an auxiliary branch while extracting contextual features through RepConv-enhanced basic blocks. This architecture reduces parameters by 0.63 M and computation by 0.4 G versus YOLOv11n's neck while maintaining multi-scale fusion capability. Third, the Receptive Field Attention Head (RFAHead) re-engineers the detection head for anatomical precision. Its core innovation, RFAConv, implements dual-task processing: 1) The attention branch computes spatial weights via global average pooling followed by grouped convolution and Softmax normalization; 2) The receptive field branch extracts multi-scale features using stride-3 grouped convolutions with ReLU activation. Spatial resampling restructures outputs into non-overlapping 3×3 blocks before element-wise multiplication and 3×3 convolution. For bounding box regression, two consecutive RFAConvs refine rib-junction localization, optimizing position via Complete IoU (CIOU) loss for global coverage and Distribution Focal Loss (DFL) for sub-pixel calibration. Classification tasks use depthwise separable convolutions with binary cross-entropy (BCE) loss. This configuration reduces mAED by 5.6%, achieving 4.20 pixel in component ablation.Comprehensive evaluations validate PPD-YOLO's superiority. Using mean Average Euclidean Distance (mAED) as the primary localization metric, the full model achieves 4.04 pixel—a 9.2% reduction versus YOLOv11n (4.45 pixel). Parameter efficiency reaches 1.91 M (0.67 M reduction), while computational complexity drops to 6.1 GFLOPs (0.2 G reduction). Quantitative analysis confirms superior resource utilization: computational efficiency (mAP50-95/FLOPs) improves by 3.2% to 16.1, while parameter localization efficiency (1/(mAED×Params)) increases 49% to 0.130. Real-time testing achieves 98.6% mAP50-95 accuracy at 70 F/s—exceeding standard CMOS sensor frame rates (30 frames/s) and satisfying production-line throughput requirements.Benchmarking against seven YOLO variants (v5n、v8n、v10n、v11n) demonstrates PPD-YOLO's leadership: lowest mAED (4.04 pixel), minimal parameters (1.91 M), and optimal computation (6.1 G). Visualization under four illumination conditions—baseline, low-light, high-intensity, and red-spectrum environments, confirming environmental robustness. Ablation studies further isolate contributions: C3LSKA reduces mAED by 1.1% (relative to the baseline value); CIFNet lowers parameters by 0.63 M with negligible mAP impact; RFAHead reduces mAED by 1.1% (relative to the baseline value). By integrating large-kernel attention for characterisitics of the cutting area enhancement, cross-stage fusion for computational efficiency, and adaptive receptive fields for anatomical precision, PPD-YOLO resolves fundamental limitations in pork cutting position detection. The framework achieves sub-5-pixel localization accuracy under industrial constraints, providing a deployable solution for robotic cutting systems. This advancement enables automated quality control and precision slaughterhouse operations, with demonstrable potential for broader meat processing applications.

       

    /

    返回文章
    返回