因果加权自编码网络与CROPGRO融合的大豆育种群体物候期预测

    Fusing causality-based weighted auto-encoder networks with DSSAT-CROPGRO for predicting phenology in soybean breeding populations

    • 摘要: 为了提高基于生长模拟模型的大豆育种群体物候期的预测精度和泛化能力,该研究结合特征解耦方法与“因果不变性”原理,提出了一种因果加权自编码网络(causality-based weighted auto-encoder network,CWAE)与物候期模拟模型融合的大豆育种群体物候期预测方法。该方法以温光数据的时段特征值以及DSSAT - CROPGRO模型生成的温光效应变量、物候期模拟值等为输入,采用马尔可夫边界和偏相关系数计算因果权重,通过加权自编码结构重构了低冗余关键的隐含特征,并从花期递进传递至荚期等后续物候阶段,校正了DSSAT - CROPGRO模型应用到育种材料的误差。利用盐城生态点2018—2020年江淮大豆育种群体的309个材料的始花期、始荚期、始粒期以及初熟期的田间试验数据,构建育种材料物候期预测的融合模型。交叉验证结果表明:引入温光效应变量增强了特征区分度;CWAE提取的隐含特征明显降低了冗余度(降幅为70.59%);递进式特征传递机制提升了预测性能,各物候期的均方根误差(root mean square error,RMSE)均降低。相较于直接利用CROPGRO模型预测大豆育种群体的物候期,融合模型的预测精度显著提升,RMSE从4.59~5.98 d降低至3.13~4.09 d(降幅为23.37%~31.81%)。利用分布有偏差的2021年盐城物候期实测数据测试,融合模型展现出良好的泛化能力,RMSE降幅为23.53%~71.01%,所有材料的平均RMSE从5.45~12.41 d降低至2.80~5.00 d,超过80%的材料预测精度有所提升。该方法能够提取高相关且低冗余的稳健特征,有效提升了大豆育种群体物候期预测的精度和泛化能力。为分析育种材料在不同温度光照条件下的表型,提供了一种基于生长模拟模型的方法。

       

      Abstract: A phenology prediction was proposed to improve the accuracy and generalization in soybean breeding populations in this paper. The feature disentanglement was integrated with the principle of "causal invariance". Causality-based weighted auto-encoder network (CWAE) was combined with a phenology simulation to correct prediction errors in the DSSAT-CROPGRO model. The causal weights were then calculated using the Markov boundary and partial correlation coefficient, according to the simulated features, such as the thermal and photoperiod variables, as the inputs. The low-redundancy latent features were reconstructed via a weighted auto-encoding architecture, and then progressively transferred to the subsequent phenological stages. Model validation was conducted after prediction optimization. The field trial data were collected from four critical phenological stages-flowering (R1), pod beginning (R3), seed formation(R5), and maturity initiation (R7). A diverse panel of 309 soybean genotypes was also recorded from the YangtzeHuai Soybean breeding line population (YHSBLP). Data collection occurred at the Yancheng ecological site over four phenological stages (2018–2020). The thermal and photo-period effect variables demonstrably enhanced the feature discriminability. The latent features after CWAE extraction showed significantly reduced redundancy, with a substantial reduction of 70.59%, compared with the raw inputs. The progressive feature transfer effectively improved the predictive performance over all stages, thus resulting in the reduction of the root mean square error (RMSE) by 13% to 19% for phenology predictions. Comparative analysis against the standalone CROPGRO model revealed that there was the fusion model greatly enhanced the prediction accuracy. The RMSE was reduced from a range of 4.59-5.98 days down to 3.13-4.09 days, indicating a decrease of 23.37% to 31.81%. The fusion model also demonstrated the strong generalization under environmental deployment, particularly in the 2021 year with distribution shifts. Among them, the population-level RMSE decreased by 23.53% to 71.01%, and the genotype-level average RMSE was reduced from 5.45-12.41 days to 2.80-5.00 days. Notably, the effective calibration was achieved for over 80% of the genotypes within the population under the challenging cross-environment validation. As such, the highly relevant, low-redundancy, and robust latent features were successfully extracted from the crop simulations. By enabling extraction of highly relevant, low-redundancy robust features, the CWAE framework significantly enhances phenology prediction accuracy and generalization capability in soybean breeding populations. This framework provides an effective and practical approach for analyzing phenotypic responses of soybean breeding materials to varying temperature and photoperiod conditions, based on enhanced crop growth simulations.

       

    /

    返回文章
    返回