基于RT-DETR的大豆植株多表型测量及产量预测

    RT-DETR-based multiphenotypic measurement and yield prediction of soybean plants

    • 摘要: 为了应对传统大豆表型测量方法主观性强、效率低,以及现有卷积神经网络在豆荚检测和籽粒计数中精度不足、主干分枝易受遮挡干扰等问题,该研究使用深度学习方法从单个大豆植物图像中提取表型特征并预测产量。采用改进的 RT-DETR (real-time detection transformer)实时检测变换器算法融合ASF(attentional scale sequence fusion)注意力量表序列融合模块,在大豆荚检测的平均精度为91.1%,此外,添加WFU(wavelet feature upgrade)小波特征升级模块的 RT-DETR 模型识别了平均精度为94.0%的主茎和分支,并使用Open CV 提取大豆主茎和分枝面积。基于提取的豆荚数、豆粒数以及主茎和分枝面积的表型特征信息,集合模型 Voting Regressor预测单株植株质量的R2值为0.90。该研究的大豆表型特征提取和产量预测方法为大豆育种和种植优化提供了有价值的见解,并为作物表型组学研究开辟了新的技术途径。

       

      Abstract: As an important economic crop and a major source of plant protein for diverse populations worldwide, the screening of high-yielding and high-quality soybean varieties has always been a priority in crop breeding research. Phenotypic traits of soybean plants are closely related to their yield, but traditional measurement methods have inherent limitations such as high subjectivity, high labor intensity, and susceptibility to errors. Especially in pod morphology and seed counting, existing convolutional neural network methods perform poorly. Meanwhile, the detection of the main stem and branches is easily affected by occlusion and curvature interference, resulting in low measurement accuracy and hindering its widespread application in actual production. To address these issues, this study introduces an enhanced real-time detection transformer (RT-DETR) algorithm to improve the detection accuracy of soybean phenotypic traits. In pod detection, this study integrates an attention-scale sequence fusion (ASF) module into the Transformer architecture of RT-DETR. Through multi-scale feature fusion and a dual attention mechanism, the model's target recognition performance in complex environments is significantly enhanced. This module comprises three core parts: the Scale Sequential Feature Fusion (SSFF) module utilizes 3D convolution and upsampling techniques to fuse multi-scale feature maps from layers P3, P4, and P5, thereby extracting scale-invariant features and enabling the network to simultaneously detect pods of different sizes (large, medium, and small); the Three-Scale Feature Encoding (TFE) module uniformly scales features from the three scales to the same resolution before concatenation, ensuring that the fused features contain both detailed and contextual information, improving the representation ability of dense, overlapping, and small pods; and the Channel and Position Attention (CPAM) module uses channel attention to select highly discriminative feature channels and spatial attention to focus on the target region while suppressing background interference, thus achieving more accurate localization and classification. During training, the ASF module provides richer gradient information, improving the model's convergence speed and stability, thereby enhancing its performance in recognizing pod phenotypic features in complex backgrounds. For the task of detecting main stems and branches, this study designed a WFU (Wavelet Feature Upgrade) module. This module performs multi-scale decomposition of the image based on wavelet transform, effectively utilizing multi-scale information in the decoder by integrating high-frequency and low-frequency features separately. This not only enhances the network's ability to learn key features but also reduces distortion in the image analysis process, thereby improving the model's sensitivity to target shape and boundaries. Compared with traditional convolution operations, the WFU module constructs a novel feature enhancement network: using two-dimensional wavelets to decompose the image into low-frequency (background) and high-frequency (target edge) components, and then routing them to two dedicated branches—MobileNet (using large convolutional kernels and LayerNorm for background suppression) and ConvNext (using small convolutional kernels and ReLU activation to enhance details). Inverse wavelet transform is used to decouple the background and target. This method first concatenates the encoder layer features with the decoder upsampled features, and then divides them into two paths: one extracts deep fine-grained semantic information through a lightweight inverse residual structure, and the other preserves spatial details. After summing the residuals, a cascaded inverse residual structure significantly reduces the false negative rate for fragmented and elongated targets. During upsampling, the dual-path architecture employs a parallel processing strategy: one path uses a 7×7 depthwise separable convolution with a two-layer FC-GELU activation function for long-range spatial compensation, while the other uses transposed convolution with a 3×3 DWConv for resolution restoration. The output of residual fusion supplements high-frequency boundary information while mitigating mesh artifacts that may be introduced by transposed convolution, thus generating clear, coherent target edges with sub-pixel accuracy without significantly increasing the number of parameters. Experimental results show that the improved RT-DETR algorithm achieves an accuracy of 91.1% in soybean pod detection and 94.0% in main stem and branch detection. Furthermore, morphological parameters of the main stem and branches were extracted using OpenCV. Based on the obtained phenotypic features such as pod number, seed number, and main stem/branch area, a voting regression ensemble model was constructed, which accurately predicted the weight per plant (R2 = 0.90), thereby enabling yield estimation. The soybean phenotypic analysis and yield prediction method proposed in this study provides reliable technical support for soybean breeding and cultivation optimization, and also offers a new technical approach for crop phenomics research.

       

    /

    返回文章
    返回