Abstract:
Timely disease control is often required to optimize the pesticide application in precision agriculture. It is also critical to achieve the high-precision segmentation of the tobacco leaf diseases in complex field environments. However, existing segmentation is subjected to three limitations: (1) The blurred lesion boundaries are observed under the leaf veins or texture noise; (2) The semantic consistency cannot uniformly capture the directional and scale-specific lesion features; (3) It is still lacking on in the quantitative assessment on of the disease severity. It is often required for the complete or accurate segmentation masks. In this study, a Multidirectional Wavelet-Enhanced Dual-Branch Decoding Network (MED-Former) framework was proposed to specifically design for the high-precision lesion segmentation and subsequent severity quantification in the real-world tobacco fields. Three technical improvements were delivered in the MED-Former. Firstly, a Wavelet Feature Enhancer (WFE) module was introduced using Haar wavelet multiscale decomposition. Multiscale features were also reconstructed in the frequency domain. The fine features in the WFE were remained unlike conventional spatial-domain enhancement. The low-frequency semantic information was preserved to integrate the high-frequency anisotropic features. The robustness of the model was significantly improved to perceive the local structures of the ambiguous lesions. The edge blurring was effectively avoided even under heavy interference from the leaf veins or background noise. Secondly, a Multi-directional Strip Attention (MSA) mechanism was proposed to enhance the semantic consistency. The horizontal, vertical, and diagonal strip pooling were used to capture the contextual information from the critical orientations, unlike the conventional attention mechanisms, which were focused on the single-directional features. Among them, the horizontal pooling was located used to locate the horizontally distributed lesions, the vertical pooling was used to capture the vertically aligned lesion patterns, and the diagonal pooling was used to identify the obliquely extending lesion edges—all of which were common in the tobacco diseases, like the target spot and powdery mildew. The MSA was strengthened to recognize directionally diverse lesions for the high semantic consistency over the entire feature map. Thirdly, a Hybrid Parallel-Serial Transformer Decoder (PSTransformer) was incorporated with a dual-branch architecture. The parallel branch was expanded the receptive field in early decoding stages, in order to capture the global lesion correlations. In contrast, the serial branch was performed on the layer-wise feature refinement to suppress the error propagation during mask generation. The minor inaccuracies were corrected from the parallel branch. The dual-branch design was operated synergistically to refine the quality and completeness of the output lesion masks, particularly for the small, scattered early-stage lesions that are easily missed by single-branch decoders. The tobacco disease data was were collected from the actual fields under complex environmental conditions, covering three common tobacco diseases. Extensive experiments were conducted on the tobacco disease dataset. The MED-Former was achieved a mean average precision (mAP@50-95) of 72.6%, mAP@50 of 88.6%, and a recall of 80.1%, outperforming the widely-used segmentation models, such as the Mask R-CNN, Mask2Former, YOLACT, and multiple YOLO-based segmentation variants. The MED-Former was improved these metrics by 2.2%, 4.0%, and 3.0%, respectively, compared with the baseline MP-Former. Although the computational load (229.9 GFLOPs) and parameter count (49.9M) were moderately higher than those of the lightweight models, the performance gain was verified, particularly for the early and subtle lesions, where the accurate segmentation was critical for the early diagnosis and severity estimation. The generalization of the model was further evaluated on the public Plant Seg v30 dataset, including 1,188 images over 10 plant species. The better performance of the MP-Former was also achieved, with mAP@50-95 of 75.1%, mAP@50 of 83.2%, and recall of 83.3%. The MP-Former shared the robust cross-domain capability and strong applicability to general plant organ segmentation. Visualizations confirm that the well-defined masks were adhered closely to the object boundaries, even for the structurally complex or fine-scale plant parts. In summary, a robust, accurate, and generalizable solution was presented to segment the disease lesions under real-field conditions. The MED-Former was integrated with the wavelet-based feature enhancement, multidirectional context modelling, and a hybrid decoding. The technical frontier of the agricultural imaging can also provide a practical tool for the tobacco disease grading and tobacco phenotyping. The approach can also hold the significant potential to thefor crop monitoring and precision plant protection in sustainable agriculture.