基于改进MP-Former网络的烟叶病斑分割和病害程度分级

    Tobacco leaf diseases segmentation and disease severity classification based on an improved MP-Former network

    • 摘要: 针对田间复杂背景下烟草叶部病害实例分割的边界模糊、语义一致性不足,以及病害程度量化缺失的问题,该研究提出一种多向小波增强双分支解码网络(multidirectional wavelet-enhanced dual-branch decoding network,MED-Former)实现病斑分割。首先,构建基于Haar小波多尺度分解的特征增强模块(Wavelet Feature Enhancer,WFE),通过低频语义保真与高频各向异性特征整合,在频域空间重构多尺度特征,提升模型对叶脉干扰下模糊病斑的局部结构感知鲁棒性。其次,设计多向条带注意力机制(multi-directional strip attention,MSA),利用水平、垂直与对角池化捕捉不同方向的上下文信息,增强模型对多形态病斑的感知能力,优化特征语义一致性;最后,构建并行-串行混合的Transformer解码器(hybrid parallel-serial transformer decoder,PSTransformer)来扩大早期解码层感受野,抑制掩码误差传播,优化病斑掩码的生成质量。在自建田间烟草病害数据集上进行试验,MED-Former模型的平均精度均值mAP@50-95、mAP@50、召回率分别为72.6%,88.6%,80.1%,相对于原模型MP-Former分别提升了2.2、4.0、3.0个百分点,且精度指标优于当下其他主流模型。该研究建立了高精度病斑分割与病害程度量化模型,为烟叶病害程度的客观、精准分级提供了有效解决方案。

       

      Abstract: Achieving high-precision segmentation of tobacco leaf diseases in complex field environments is critical for enabling timely disease control, optimizing pesticide application strategies, and advancing the development of precision agriculture. However, existing segmentation methods face three major limitations: blurred lesion boundaries, especially when disturbed by leaf veins or texture noise; insufficient semantic consistency, failing to capture directional and scale-specific lesion features uniformly; and a lack of reliable support for quantitative disease severity assessment due to incomplete or inaccurate segmentation masks. To address these challenges, this paper proposes a novel Multidirectional Wavelet-Enhanced Dual-Branch Decoding Network (MED-Former), specifically designed for high-precision lesion segmentation and subsequent severity quantification in real-world tobacco fields. The MED-Former framework delivers three key technical innovations. First, it introduces a Wavelet Feature Enhancer (WFE) module based on Haar wavelet multiscale decomposition. Unlike traditional spatial-domain enhancement methods that often lose fine details, the WFE operates in the frequency domain to reconstruct multiscale features: it preserves low-frequency semantic information while integrating high-frequency anisotropic details. This design significantly boosts the model’s robustness in perceiving local structures of ambiguous lesions, effectively mitigating edge blurring even under heavy interference from leaf veins or background noise. Second, a Multi-directional Strip Attention (MSA) mechanism is proposed to address semantic inconsistency. Unlike conventional attention mechanisms that focus on single-directional features, the MSA uses horizontal, vertical, and diagonal strip pooling to capture contextual information from three critical orientations. Horizontal pooling targets horizontally distributed lesions, vertical pooling captures vertically aligned lesion patterns, and diagonal pooling identifies obliquely extending lesion edges—all of which are common in tobacco diseases like target spot and powdery mildew. MSA strengthening the model’s ability to recognize directionally diverse lesions and improving semantic consistency across the entire feature map. Third, the framework incorporates a Hybrid Parallel-Serial Transformer Decoder (PSTransformer) with a dual-branch architecture. The parallel branch expanding the receptive field in early decoding stages to capture global lesion correlations. The serial branch, in contrast, performs layer-wise feature refinement to suppress error propagation during mask generation, correcting minor inaccuracies from the parallel branch. This dual-branch design works synergistically to refine the quality and completeness of output lesion masks, particularly for small, scattered early-stage lesions that are easily missed by single-branch decoders. Extensive experiments were conducted on a self-constructed tobacco disease dataset, which collected from actual fields, covering three common tobacco diseases under complex environmental conditions. The proposed MED-Former achieved a mean average precision (mAP@50-95) of 72.6%, mAP@50 of 88.6%, and recall of 80.1%, surpassing widely-used segmentation models such as Mask R-CNN, Mask2Former, YOLACT, and multiple YOLO-based segmentation variants. Compared to the baseline MP-Former, MED-Former improved these metrics by 2.2%, 4.0%, and 3.0%, respectively. Although the model's computational load (229.9 GFLOPs) and parameter count (49.9M) are moderately higher than some lightweight models, the performance gain is justified—particularly for early and subtle lesions, where accurate segmentation is critical for early diagnosis and severity estimation. To evaluate generalization, the model was further tested on the public Plant Seg v30 dataset, comprising 1,188 images across 10 plant species. MED-Former again achieved state-of-the-art results, with mAP@50-95 of 75.1%, mAP@50 of 83.2%, and recall of 83.3%. These results demonstrate its robust cross-domain capability and strong applicability to general plant organ segmentation. Visualizations confirm that the model produces well-defined masks that adhere closely to object boundaries, even for structurally complex or fine-scale plant parts. In summary, this study presents a robust, accurate, and generalizable solution for segmenting disease lesions under real-field conditions. By integrating wavelet-based feature enhancement, multidirectional context modeling, and a hybrid decoding strategy, MED-Former not only advances the technical frontier of agricultural image analysis but also provides a practical tool for tobacco automated disease grading and tobacco phenotyping. The proposed approach holds significant potential for broader applications in crop monitoring, precision plant protection, and sustainable agricultural management.

       

    /

    返回文章
    返回