LDH-YOLOv11n：一种高效的温室辣椒轻量化目标检测模型

陈娉婷; 马海荣; 罗治情; 官波; 尹延旭; 曾诚

doi:10.11975/j.issn.1002-6819.202505120

LDH-YOLOv11n：一种高效的温室辣椒轻量化目标检测模型

LDH-YOLOv11n: An efficient lightweight object detection model for greenhouse chili peppers

摘要

摘要: 为解决温室环境下辣椒检测任务的实时性要求高、边缘设备算力受限等问题，该研究提出了一种基于改进YOLOv11n的轻量化检测模型LDH-YOLOv11n。首先，为使模型更加关注复杂环境下辣椒的关键特征，将SimAM注意力机制融入C3k2模块；其次，为了在减少计算负荷的同时保持检测精度，引入ADown下采样模块替代原有部分卷积模块；最后，为满足嵌入式设备部署的需求，设计了轻量化LDH-Detect（lightweight detection head）替代原检测头模块，进一步实现模型轻量化。试验结果表明，改进后模型LDH-YOLOv11n在测试集上的准确率、召回率和mAP_50-95分别达到94.3%、90.1%和77.0%，相较于基线模型YOLOv11n分别提升1.0、2.2、2.1个百分点；模型参数量为1.6 M，浮点运算量为3.9 G，相较于基线模型分别降低38.5%和38.1%。可视化结果表明，改进后模型对复杂环境下的辣椒有良好的检测效果。边缘设备部署试验结果表明，使用TensorRT加速，改进后模型的推理速度达到264.6 帧/s，能满足实时检测的要求。综合以上结果表明，改进后模型LDH-YOLOv11n更加轻量化，检测效果更好，满足在移动端进行辣椒检测的高精度和轻量化需求。

Abstract: Harvesting high-value crops such as chili peppers remains highly labor-intensive, limiting productivity in protected automation. Although vision-based intelligent harvesting robots offer a promising solution, their practical deployment is constrained by the difficulty of achieving high detection precision in visually complex greenhouse environments and the limited computational resources of embedded edge devices. To address this critical challenge, this study proposed LDH-YOLOv11n, a lightweight object detection model specifically engineered to achieve high precision and low computational complexity for real-time chili pepper detection in practical greenhouse settings. A comprehensive, self-constructed image dataset served as the foundation of this work. Images were captured across different developmental stages of chili peppers, ranging from early green fruits to fully mature red fruits. All images were then manually annotated using Labelme, and the dataset was partitioned into training, validation, and test sets following a 7:1:2 ratio to prevent data leakage and ensure unbiased model evaluation. To enhance model generalization and improve performance under diverse environmental conditions, data augmentation techniques, including random Gaussian noise, brightness adjustment, geometric transformations (flipping and rotation), and color saturation adjustment, were applied, resulting in a finalized dataset of 8940 images. The proposed LDH-YOLOv11n architecture extended the baseline YOLOv11n model through three key innovations. First, the Similarity-Aware Activation Module (SimAM) was embedded into the C3k2 module to improve the model’s attention to salient features of the chili peppers by reducing distraction from cluttered visual contexts, such as overlapping leaves and branches. Second, several standard convolutional downsampling operations were replaced with the average pooling downsampling (ADown) module, which effectively cut computational cost while maintaining sufficient feature fidelity for accurate detection. Third, the original detection head was replaced with a custom-designed lightweight detection head (LDH-Detect), reducing overall model redundancy without compromising detection precision. Extensive experiments were conducted to benchmark LDH-YOLOv11n against mainstream detection algorithms, including Faster R-CNN, SSD-300, and a comprehensive series of YOLO variants (v3-tiny, v5n, v6n, v8n, v10n, v11n, and v12n). On the custom greenhouse chili dataset, the LDH-YOLOv11n model achieved a precision of 94.3%, a recall of 90.1%, and mAP_50-95 of 77.0%, with only 1.6 million parameters and 3.9 Giga floating-point operations per second (GFLOPs). These results represented significant improvements over the YOLOv11n baseline, with gains of 1.0, 2.2, and 2.1 percentage points in precision, recall, and mAP_50-95, respectively, while simultaneously reducing the model’s parameter count by 38.5% and its GFLOPs by 38.1%. Qualitative evaluations further demonstrated the model’s robustness across four representative and challenging scenarios: standard illumination, fruit overlap, branch and leaf occlusion, and low-light conditions. LDH-YOLOv11n was the only model achieving zero false positives and zero missed detections across all four conditions, outperforming all other models. Furthermore, deployment tests on an embedded edge device equipped with TensorRT yielded an inference speed of 264.6 frames per second (FPS), a 3.58-fold improvement over the non-accelerated version and far exceeding the 30 FPS benchmark for real-time performance. In conclusion, the proposed LDH-YOLOv11n model represents a practically deployable lightweight detection model that effectively reconciles the competing requirements of high detection precision and low computational complexity. Its robustness, precision, and efficiency position it as a potent candidate for accelerating the deployment of intelligent harvesting systems and advancing precision agriculture.

HTML全文

参考文献(38)

施引文献

资源附件(0)