Research and implementation of pork carcass cutting and localization based on improved YOLOv11n algorithm
-
Graphical Abstract
-
Abstract
This research presents PPD-YOLO, an optimized deep learning framework designed to overcome critical limitations in automated pork carcass cutting position detection: excessive computational requirements and insufficient localization precision in industrial settings. Targeting the specific challenges of porcine anatomical segmentation—where traditional methods fail to accurately locate the 4th/5th rib interspace (anterior sectioning point) and sacral-lumbar junction (posterior sectioning point)—we establish a dedicated image dataset sourced from slaughterhouse production lines. After capturing 79 raw images of suspended half-carcasses under variable illumination (natural/artificial light interactions), rigorous preprocessing and data augmentation expanded the dataset to 1,264 annotated images, effectively mitigating overfitting risks for deep learning applications.Architecturally, PPD-YOLO enhances the YOLOv11n baseline through three synergistic innovations. First, the 3-Convolution Large Separable Kernel Attention (C3LSKA) module replaces standard backbone attention mechanisms. By decomposing conventional 2D convolutions into horizontal/vertical depthwise separable operations with 7×7 kernels (initial receptive field=7, dilation rate=2), this design expands the effective receptive field while reducing parameters. The module further implements channel splitting: 50% of features undergo LSKA processing to capture global rib-interstice semantics, while parallel 1×1 convolutions preserve local rib-edge details. Feature recombination via channel concatenation enhances cutting-region representation, reducing mAED by 1.1% in ablation studies.Second, the Cross-Stage Interactive Fusion Network (CIFNet) reconstructs the neck network using lightweight residual principles. Four-stage hierarchical processing integrates multi-scale backbone features (S3, S4, S5): 1) S5 upsampling and concatenation with reparameterized S4 features, processed through a CSP module to generate D4; 2) D4 upsampling and fusion with S3 yielding P3; 3) P3 downsampling and merging with D4 to form P4; 4) P4 downsampling integrated with S5 producing P5. Each stage employs CSP modules with dual-branch processing—preserving spatial distributions via an auxiliary branch while extracting contextual features through RepConv-enhanced basic blocks. This architecture reduces parameters by 0.63 M and computation by 0.4 G versus YOLOv11n's neck while maintaining multi-scale fusion capability. Third, the Receptive Field Attention Head (RFAHead) re-engineers the detection head for anatomical precision. Its core innovation, RFAConv, implements dual-task processing: 1) The attention branch computes spatial weights via global average pooling followed by grouped convolution and Softmax normalization; 2) The receptive field branch extracts multi-scale features using stride-3 grouped convolutions with ReLU activation. Spatial resampling restructures outputs into non-overlapping 3×3 blocks before element-wise multiplication and 3×3 convolution. For bounding box regression, two consecutive RFAConvs refine rib-junction localization, optimizing position via Complete IoU (CIOU) loss for global coverage and Distribution Focal Loss (DFL) for sub-pixel calibration. Classification tasks use depthwise separable convolutions with binary cross-entropy (BCE) loss. This configuration reduces mAED by 5.6%, achieving 4.20 pixel in component ablation.Comprehensive evaluations validate PPD-YOLO's superiority. Using mean Average Euclidean Distance (mAED) as the primary localization metric, the full model achieves 4.04 pixel—a 9.2% reduction versus YOLOv11n (4.45 pixel). Parameter efficiency reaches 1.91 M (0.67 M reduction), while computational complexity drops to 6.1 GFLOPs (0.2 G reduction). Quantitative analysis confirms superior resource utilization: computational efficiency (mAP50-95/FLOPs) improves by 3.2% to 16.1, while parameter localization efficiency (1/(mAED×Params)) increases 49% to 0.130. Real-time testing achieves 98.6% mAP50-95 accuracy at 70 F/s—exceeding standard CMOS sensor frame rates (30 frames/s) and satisfying production-line throughput requirements.Benchmarking against seven YOLO variants (v5n、v8n、v10n、v11n) demonstrates PPD-YOLO's leadership: lowest mAED (4.04 pixel), minimal parameters (1.91 M), and optimal computation (6.1 G). Visualization under four illumination conditions—baseline, low-light, high-intensity, and red-spectrum environments, confirming environmental robustness. Ablation studies further isolate contributions: C3LSKA reduces mAED by 1.1% (relative to the baseline value); CIFNet lowers parameters by 0.63 M with negligible mAP impact; RFAHead reduces mAED by 1.1% (relative to the baseline value). By integrating large-kernel attention for characterisitics of the cutting area enhancement, cross-stage fusion for computational efficiency, and adaptive receptive fields for anatomical precision, PPD-YOLO resolves fundamental limitations in pork cutting position detection. The framework achieves sub-5-pixel localization accuracy under industrial constraints, providing a deployable solution for robotic cutting systems. This advancement enables automated quality control and precision slaughterhouse operations, with demonstrable potential for broader meat processing applications.
-
-