Abstract:
Achieving high-precision segmentation of tobacco leaf diseases in complex field environments is critical for enabling timely disease control, optimizing pesticide application strategies, and advancing the development of precision agriculture. However, existing segmentation methods face three major limitations: blurred lesion boundaries, especially when disturbed by leaf veins or texture noise; insufficient semantic consistency, failing to capture directional and scale-specific lesion features uniformly; and a lack of reliable support for quantitative disease severity assessment due to incomplete or inaccurate segmentation masks. To address these challenges, this paper proposes a novel Multidirectional Wavelet-Enhanced Dual-Branch Decoding Network (MED-Former), specifically designed for high-precision lesion segmentation and subsequent severity quantification in real-world tobacco fields. The MED-Former framework delivers three key technical innovations. First, it introduces a Wavelet Feature Enhancer (WFE) module based on Haar wavelet multiscale decomposition. Unlike traditional spatial-domain enhancement methods that often lose fine details, the WFE operates in the frequency domain to reconstruct multiscale features: it preserves low-frequency semantic information while integrating high-frequency anisotropic details. This design significantly boosts the model’s robustness in perceiving local structures of ambiguous lesions, effectively mitigating edge blurring even under heavy interference from leaf veins or background noise. Second, a Multi-directional Strip Attention (MSA) mechanism is proposed to address semantic inconsistency. Unlike conventional attention mechanisms that focus on single-directional features, the MSA uses horizontal, vertical, and diagonal strip pooling to capture contextual information from three critical orientations. Horizontal pooling targets horizontally distributed lesions, vertical pooling captures vertically aligned lesion patterns, and diagonal pooling identifies obliquely extending lesion edges—all of which are common in tobacco diseases like target spot and powdery mildew. MSA strengthening the model’s ability to recognize directionally diverse lesions and improving semantic consistency across the entire feature map. Third, the framework incorporates a Hybrid Parallel-Serial Transformer Decoder (PSTransformer) with a dual-branch architecture. The parallel branch expanding the receptive field in early decoding stages to capture global lesion correlations. The serial branch, in contrast, performs layer-wise feature refinement to suppress error propagation during mask generation, correcting minor inaccuracies from the parallel branch. This dual-branch design works synergistically to refine the quality and completeness of output lesion masks, particularly for small, scattered early-stage lesions that are easily missed by single-branch decoders. Extensive experiments were conducted on a self-constructed tobacco disease dataset, which collected from actual fields, covering three common tobacco diseases under complex environmental conditions. The proposed MED-Former achieved a mean average precision (mAP@50-95) of 72.6%, mAP@50 of 88.6%, and recall of 80.1%, surpassing widely-used segmentation models such as Mask R-CNN, Mask2Former, YOLACT, and multiple YOLO-based segmentation variants. Compared to the baseline MP-Former, MED-Former improved these metrics by 2.2%, 4.0%, and 3.0%, respectively. Although the model's computational load (229.9 GFLOPs) and parameter count (49.9M) are moderately higher than some lightweight models, the performance gain is justified—particularly for early and subtle lesions, where accurate segmentation is critical for early diagnosis and severity estimation. To evaluate generalization, the model was further tested on the public Plant Seg v30 dataset, comprising 1,188 images across 10 plant species. MED-Former again achieved state-of-the-art results, with mAP@50-95 of 75.1%, mAP@50 of 83.2%, and recall of 83.3%. These results demonstrate its robust cross-domain capability and strong applicability to general plant organ segmentation. Visualizations confirm that the model produces well-defined masks that adhere closely to object boundaries, even for structurally complex or fine-scale plant parts. In summary, this study presents a robust, accurate, and generalizable solution for segmenting disease lesions under real-field conditions. By integrating wavelet-based feature enhancement, multidirectional context modeling, and a hybrid decoding strategy, MED-Former not only advances the technical frontier of agricultural image analysis but also provides a practical tool for tobacco automated disease grading and tobacco phenotyping. The proposed approach holds significant potential for broader applications in crop monitoring, precision plant protection, and sustainable agricultural management.