Abstract:
Dangshan pear is one of the most popular fruits in China. Among them, the surface defects can often be required to monitor the pear in real time. However, the practical requirements are limited to the computing resources of the devices under real-time constraints in industrial environments. In this study, a lightweight and efficient detection model (named AIC-YOLOv11n) was developed using the YOLOv11n architecture. Specifically, an Adown down-sampling module was introduced into the backbone. Both the floating-point and parameters were reduced to enhance the feature extraction. Additionally, the original C2PSA module was replaced with the C2PSA-iRMB one. An inverted residual mobile block (iRMB) was integrated with the attention mechanisms in order to efficiently capture the long-range dependencies with less computational overhead. Moreover, a cross-scale feature fusion module (CFFM) was employed in the neck structure of the network. Some features at different scales were effectively merged to improve the detection accuracy of the small-scale defects. A dataset with 5,000 labeled images was constructed to validate the performance of the improved model. The images were also collected using the conveyor-belt multi-surface imaging system, that equipped with synchronized upper and lower illumination boxes and industrial-grade cameras. The dataset included five categories: Calyx, stem-end cap, scratches, rust spots, and mold spots. Data augmentation was also carried out, including rotation, flipping, and brightness adjustments. The dataset was then partitioned into the training, validation, and test datasets at an 8:1:1 ratio. Experimental results showed that the improved AIC-YOLOv11n model achieved better performance in detection, compared with the baseline YOLOv11n. Specifically, there was a precision of 92.5%, a recall rate of 87.5%, an mAP
0.5, of 92.7%, and an mAP
0.5-0.95 of 70.5%, which were improved by 0.3, 5.5, 5.1, and 2.4 percentage points, respectively. Additionally, the computational costs were reduced significantly to require only 4.3 G, 1.46 million parameters, and a model size of 3.11 MB, which were reduced by 31.7%, 43.4%, and 40.5%, respectively. Furthermore, the peak GPU memory usage remained below 4.83 GB, and the inference speed reached 120.1 frames per second (FPS), thus fully meeting the real-time requirement of the defect inspection. Ablation studies demonstrated that there were the great contributions of the three modules. Among them, the Adown achieved the greatest improvement in the recall, while the CFFM significantly enhanced the detection accuracy of the small objects, and C2PSA-iRMB effectively increased the precision. Grad-CAM visualization further confirmed that the improved model was focused accurately on the defect regions, while suppressing the interference from normal anatomical structures. Online TensorRT deployment was then utilized to validate the improved model in an industrial scenario. Once converted to a TensorRT FP16 inference engine, there was a single-image inference latency of just 1.4 ms without compromising accuracy, indicating its suitability for real-world applications. In conclusion, the AIC-YOLOv11n was provided to balance the accurate, efficient, and lightweight surface defect detection on Dangshan pears. Model pruning, knowledge distillation, and transfer learning can be expected for the more fruit types in agricultural industries.