基于改进MP-Former网络的烟叶病斑分割和病害程度分级

张文耀; 谢涛; 谢雨含; 王洪亮; 吕芬

doi:10.11975/j.issn.1002-6819.202509041

基于改进MP-Former网络的烟叶病斑分割和病害程度分级

Segmenting tobacco leaf diseases to classify severity level using improved MP-Former network

摘要

摘要: 针对田间复杂背景下烟草叶部病害实例分割的边界模糊、语义一致性不足和病害程度量化缺失的问题，该文提出一种多向小波增强双分支解码网络（multidirectional wavelet-enhanced dual-branch decoding network，MED-Former）实现病斑分割。首先，构建基于Haar小波多尺度分解的特征增强模块（wavelet feature enhancer，WFE），通过低频语义保真与高频各向异性特征整合，在频域空间重构多尺度特征，提升模型对叶脉干扰下模糊病斑的局部结构感知鲁棒性。其次，设计多向条带注意力机制（multi-directional strip attention，MSA），利用水平、垂直与对角池化捕捉不同方向的上下文信息，增强模型对多形态病斑的感知能力，优化特征语义一致性；最后，构建并行-串行混合的Transformer解码器（hybrid parallel-serial transformer decoder，PSTransformer）来扩大早期解码层感受野，抑制掩码误差传播，优化病斑掩码的生成质量。在自建田间烟草病害数据集上进行试验，MED-Former模型的平均精度均值mAP@50-95、mAP@50、召回率分别为72.6%，88.6%，80.1%，相对于原模型MP-Former分别提升了2.2、4.0、3.0个百分点，且精度指标优于当下其他主流模型。该文建立的高精度病斑分割与病害程度量化模型，为烟叶病害程度的客观、精准分级提供了有效解决方案。

Abstract: Timely disease control is often required to optimize the pesticide application in precision agriculture. It is also critical to achieve the high-precision segmentation of the tobacco leaf diseases in complex field environments. However, existing segmentation is subjected to three limitations: (1) The blurred lesion boundaries are observed under the leaf veins or texture noise; (2) The semantic consistency cannot uniformly capture the directional and scale-specific lesion features; (3) It is still lacking on in the quantitative assessment on of the disease severity. It is often required for the complete or accurate segmentation masks. In this study, a Multidirectional Wavelet-Enhanced Dual-Branch Decoding Network (MED-Former) framework was proposed to specifically design for the high-precision lesion segmentation and subsequent severity quantification in the real-world tobacco fields. Three technical improvements were delivered in the MED-Former. Firstly, a Wavelet Feature Enhancer (WFE) module was introduced using Haar wavelet multiscale decomposition. Multiscale features were also reconstructed in the frequency domain. The fine features in the WFE were remained unlike conventional spatial-domain enhancement. The low-frequency semantic information was preserved to integrate the high-frequency anisotropic features. The robustness of the model was significantly improved to perceive the local structures of the ambiguous lesions. The edge blurring was effectively avoided even under heavy interference from the leaf veins or background noise. Secondly, a Multi-directional Strip Attention (MSA) mechanism was proposed to enhance the semantic consistency. The horizontal, vertical, and diagonal strip pooling were used to capture the contextual information from the critical orientations, unlike the conventional attention mechanisms, which were focused on the single-directional features. Among them, the horizontal pooling was located used to locate the horizontally distributed lesions, the vertical pooling was used to capture the vertically aligned lesion patterns, and the diagonal pooling was used to identify the obliquely extending lesion edges—all of which were common in the tobacco diseases, like the target spot and powdery mildew. The MSA was strengthened to recognize directionally diverse lesions for the high semantic consistency over the entire feature map. Thirdly, a Hybrid Parallel-Serial Transformer Decoder (PSTransformer) was incorporated with a dual-branch architecture. The parallel branch was expanded the receptive field in early decoding stages, in order to capture the global lesion correlations. In contrast, the serial branch was performed on the layer-wise feature refinement to suppress the error propagation during mask generation. The minor inaccuracies were corrected from the parallel branch. The dual-branch design was operated synergistically to refine the quality and completeness of the output lesion masks, particularly for the small, scattered early-stage lesions that are easily missed by single-branch decoders. The tobacco disease data was were collected from the actual fields under complex environmental conditions, covering three common tobacco diseases. Extensive experiments were conducted on the tobacco disease dataset. The MED-Former was achieved a mean average precision (mAP@50-95) of 72.6%, mAP@50 of 88.6%, and a recall of 80.1%, outperforming the widely-used segmentation models, such as the Mask R-CNN, Mask2Former, YOLACT, and multiple YOLO-based segmentation variants. The MED-Former was improved these metrics by 2.2%, 4.0%, and 3.0%, respectively, compared with the baseline MP-Former. Although the computational load (229.9 GFLOPs) and parameter count (49.9M) were moderately higher than those of the lightweight models, the performance gain was verified, particularly for the early and subtle lesions, where the accurate segmentation was critical for the early diagnosis and severity estimation. The generalization of the model was further evaluated on the public Plant Seg v30 dataset, including 1,188 images over 10 plant species. The better performance of the MP-Former was also achieved, with mAP@50-95 of 75.1%, mAP@50 of 83.2%, and recall of 83.3%. The MP-Former shared the robust cross-domain capability and strong applicability to general plant organ segmentation. Visualizations confirm that the well-defined masks were adhered closely to the object boundaries, even for the structurally complex or fine-scale plant parts. In summary, a robust, accurate, and generalizable solution was presented to segment the disease lesions under real-field conditions. The MED-Former was integrated with the wavelet-based feature enhancement, multidirectional context modelling, and a hybrid decoding. The technical frontier of the agricultural imaging can also provide a practical tool for the tobacco disease grading and tobacco phenotyping. The approach can also hold the significant potential to thefor crop monitoring and precision plant protection in sustainable agriculture.

HTML全文

参考文献(30)

施引文献

资源附件(0)