基于RT-DETR的大豆植株多表型测量及产量预测

孙岚琪; 张琦沅; 贺昱航; 郭凯; 杨策; 苏文浩; 田志喜

doi:10.11975/j.issn.1002-6819.202509220

基于RT-DETR的大豆植株多表型测量及产量预测

RT-DETR-based multiphenotypic measurement and yield prediction of soybean plants

摘要

摘要: 为了应对传统大豆表型测量方法主观性强、效率低，以及现有卷积神经网络在豆荚检测和籽粒计数中精度不足、主干分枝易受遮挡干扰等问题，该研究使用深度学习方法从单个大豆植物图像中提取表型特征并预测产量。采用改进的实时检测变换器算法（real-time detection transformer，RT-DETR）融合注意力量表序列融合（attentional scale sequence fusion，ASF）模块，在大豆荚检测的平均精度为0.911，此外，添加小波特征升级（wavelet feature upgrade，WFU）模块的 RT-DETR 模型识别主茎和分支的平均精度为0.940 ，并使用Open CV 提取大豆主茎和分枝面积。基于提取的豆荚数、豆粒数以及主茎和分枝面积的表型特征信息，集合模型 RF-PLSR 预测单株植株重量的决定系数R²值为0.90。该研究的大豆表型特征提取和产量预测方法为大豆育种和种植优化提供了有价值的见解，并为作物表型组学研究开辟了新的技术途径。

Abstract: Soybean is one of the most important economic crops worldwide, particularly for its major source of plant protein. It is often required to screen the high-yielding and high-quality varieties in crop breeding. Among them, the phenotypic traits of the soybean plants are closely related to the yield. But conventional measurement has hindered the widespread application in actual production, such as the high subjectivity, labor intensity, and susceptibility to errors. Especially, existing convolutional neural networks (CNN) cannot fully meet the requirements of the pod morphology and seed counting. Meanwhile, the detection of the main stem and branches is easily affected by the occlusion and curvature interference, resulting in low measurement accuracy. In this study, an enhanced real-time detection transformer (RT-DETR) algorithm was introduced to improve the detection accuracy of the soybean phenotypic traits. In pod detection, an attention-scale sequence fusion (ASF) module was integrated into the Transformer architecture of the RT-DETR. The performance of the target recognition was significantly enhanced in the complex environments using multi-scale feature fusion and a dual attention mechanism. Three core parts were divided into the module: (1) The Scale Sequential Feature Fusion (SSFF) module fused the multi-scale feature maps from the layers P3, P4, and P5 using 3D convolution and upsampling techniques. Thereby, the scale-invariant features were extracted to simultaneously detect the pods with the different sizes (large, medium, and small); (2) The Three-Scale Feature Encoding (TFE) module was used to uniformly segment the features from the three scales to the same resolution before concatenation. The fused features contained both detailed and contextual information in order to improve the representation of the dense, overlapping, and small pods. (3) The Channel and Position Attention (CPAM) module was used to select the highly discriminative feature channels and spatial attention. The target region was focused on suppressing the background interference, thus achieving more accurate localization and classification. The ASF module was provided with richer gradient information during training. The convergence speed and stability were enhanced to recognize the pod phenotypic features in complex backgrounds. A Wavelet Feature Upgrade (WFU) module was designed to detect the main stems and branches. Multi-scale decomposition of the image was performed using the wavelet transform. Multi-scale information was effectively utilized to integrate the high- and low-frequency features in the decoder. The key features were learned to reduce the distortion after the image analysis. Thereby the better performance was achieved to improve the sensitivity to the target shape and boundaries. The WFU module was constructed to enhance the feature network, compared with the conventional convolution. Two-dimensional wavelets were used to decompose the image into the low-frequency (background) and high-frequency (target edge) components, and then routing them into two branches—MobileNet (using large convolutional kernels and LayerNorm for the background suppression) and ConvNext (using small convolutional kernels and ReLU activation to enhance the details). The inverse wavelet transform was used to decouple the background and target. The encoder layer features were firstly used to connect with the decoder upsampled features, and then divided into two paths: one was to extract the deep fine-grained semantic information using a lightweight inverse residual structure, and another was to preserve the spatial details. After summing the residuals, a cascaded inverse residual structure significantly reduced the false negative rate for the fragmented and elongated targets. During upsampling, the dual-path architecture was employed using parallel processing: one path used a 7×7 depthwise separable convolution with a two-layer FC-GELU activation function for the long-range spatial compensation, while another used a transposed convolution with a 3×3 DWConv for the resolution restoration. The output of the residual fusion was used to supplement the high-frequency boundary information. While the mesh artifacts were avoided after transposed convolution. Thus, the coherent target edges were generated with sub-pixel accuracy and a low number of parameters. Experimental results show that the improved RT-DETR algorithm achieved an accuracy of 0.911 of the soybean pod detection and 0.940 of the main stem and branch detection. Furthermore, the morphological parameters of the main stem and branches were extracted using OpenCV. According to the phenotypic features, such as the pod number, seed number, and main stem/branch area, a voting regression ensemble model was constructed to accurately predict the weight per plant (R²= 0.90). Thereby, the yield estimation was realized after prediction. The soybean phenotypic analysis and yield prediction can provide reliable technical support for soybean breeding and cultivation optimization. The finding can also offer a technical approach for crop phenomics.

HTML全文

参考文献(31)

施引文献

资源附件(0)