基于改进YOLOv8n的低倍率显微图像黄瓜霜霉病孢子囊检测

    Detecting cucumber downy mildew sporangia from low-magnification microscopic images using improved YOLOv8n model

    • 摘要: 针对低倍率显微图像下黄瓜霜霉病孢子囊检测存在人工效率低、传统算法鲁棒性不足的问题,该研究提出一种改进的YOLOv8n模型。通过多模块协同优化提升检测性能:1)设计WhirlConv(whirl convolution)模块,采用四分支反射填充与独立卷积核捕获多方向特征,结合通道注意力机制抑制冗余信息;2)引入P2层级高分辨率特征图构建多尺度检测头,扩展极小目标覆盖范围;3)在SPPF(spatial pyramid pooling-fast)模块中嵌入LSKA(large separable kernel attention)注意力机制,通过大分离卷积核捕获长程依赖关系,在保持模块轻量化的同时实现性能的提升。试验表明,改进模型在自建数据集上精确度达到94.2%,召回率达到90.1%,平均精度均值(mAP0.5)达到86.9%,较基准模型YOLOv8n分别提升10、7.2和7.8个百分点,参数量(17.7 M)和浮点运算量(56.9 G)比RT-DETR-R50分别减少25.1 M和77.5 G。该模型为低倍率显微图像下的孢子囊检测提供了一种有效检测方法。

       

      Abstract: The high-magnification (≥200×) microscopic imaging has limited to detect the downy mildew sporangia of the cucumber. Some challenges are remained on the excessive storage demands, incompatibility with the portable field devices, and discrepancies between laboratory-induced samples and natural agricultural environments. In this study, an enhanced You Only Look Once version 8n (YOLOv8n) model was developed to optimize for the practical low-magnification (100×) microscopy. The synergistic architecture approaches were introduced to improve the detection robustness in complex field scenarios: (1) A Whirl Convolution (WhirlConv) module was used to replace the standard convolutions. A four-branch architecture was employed with the reflection padding and independent convolutional kernels, in order to capture the multi-directional edge features, while to suppress the background noise via channel attention. The boundary distortion that caused by traditional zero-padding was mitigated for the random orientation of the sporangia. (2) High-resolution P2-layer features (spatial resolution: 160×160) were fused into the feature pyramid network. The multi-scale detection heads were obtained to improve the localization accuracy for the extremely small targets (average size: 31×27 pixels, occupying 0.02% of the 2560×1920-pixel input image). (3) The Spatial Pyramid Pooling-Fast (SPPF) module was augmented with a large separable kernel attention (LSKA) mechanism. The large separable kernels (e.g., 15×15) were utilized to capture the long-range dependencies and the global contextual information. The localized directional features were complemented to extract by WhirlConv. The dataset was constructed under authentic field conditions at the Xiaotangshan National Precision Agriculture Research Base (Beijing, China) using a volumetric spore sampler and LEICA DM3000 LED microscope at 100× magnification (10× eyepiece, 10× objective, 1× zoom). The natural challenges were observed to capture the images, including the dense sporangia clusters, overlapping structures, and interference from the field impurities (pollen, and dust). The dataset was comprised 300 raw images (expanded to 1 200 via rotation, hue shifts, and saturation adjustments), in order to align with the real-world agricultural scenarios. The annotation protocols were implemented under the supervision of plant pathologists, excluding ambiguous targets after morphological validation and dataset training. The optimal model was achieved in a precision of 94.2%, a recall of 90.1%, and a mean average precision at an intersection-over-union threshold of 0.5 (mAP@0.5) of 86.9%. The performance of the improved model also outperformed the baseline YOLOv8n by 10.0, 7.2, and 7.8 percentage point, respectively. Compared with the high-magnification models, the significant advantages were: the improved model surpassed YOLOv8x (258.1 giga floating-point operations per second GFLOPs, 79.1% mAP@0.5) by 7.8% in mAP@0.5, while reducing storage requirements by 75% and computational complexity by 78% (56.9 vs. 258.1 GFLOPs). Ablation studies confirmed that there were the great contributions of each module—WhirlConv alone was improved mAP@0.5 by 3.3%, while the P2 features and LSKA were integrated to synergistically enhance the performance. Visualization analysis demonstrated that the better robustness of the improved model was achieved in the field-relevant scenarios: in the dense clusters (174 targets per region), the model was reduced the false negatives by 14.4% (1.7% vs. 16.1% for YOLOv8n), and under impurity interference, the false positives were limited to 1.0%. Heatmap visualizations were also validated to focus on the densely packed sporangia, with the activation regions aligning closely with the ground-truth annotations. The practical deployment was maintained with 17.7 million parameters and 56.9 GFLOPs computational complexity, thus outperforming the mainstream detectors like RT-DETR-R18 (20.1 million parameters, 78.1% mAP@0.5) by 8.8% in mAP@0.5. Compared with the baseline YOLOv8n (8.2 GFLOPs), the improved architecture was achieved in a superior accuracy-efficiency trade-off, suitable for the resource-constrained agricultural systems. The lightweight adaptations can be prioritized for the embedded deployment without compromising detection fidelity in future, including depth-wise separable convolutions and quantization. This work can bridge the research gap between laboratory research and practical agricultural needs. The finding can provide a scalable solution to early disease monitoring in the low-magnification field microscopy.

       

    /

    返回文章
    返回