基于多模态数据融合的冬小麦区县级产量预测

    Winter wheat yield prediction at county-level based on multi-modal data fusion

    • 摘要: 为实现大范围、准确且高效的冬小麦产量预测,该研究提出了一种基于多模态数据融合的深度学习模型。利用地表反射率数据和全球陆地数据同化系统数据大范围获取的优势,有效融合多源数据,提升模型的预测精度。利用卷积神经网络(convolutional neural networks, CNN)提取遥感影像的空间特征,并引入卷积注意力(convolutional block attention module, CBAM)来抑制背景区域的干扰,同时结合跨分支交叉注意力块(cross-branch cross-attention block, CCAB)捕捉冬小麦生长周期中关键生育期的时序依赖关系。为整合历史产量信息,模型引入了门控循环单元网络提取历史产量的时序特征,并通过动态门控机制(dynamic gated fusion module, DGFM)自适应地融合多源特征,从而更准确地反映长期趋势与年际波动。结果表明,模型的决定系数为0.909,均方根误差(root mean squared error, RMSE)和平均绝对误差(mean absolute error, MAE)分别为512.97和377.08 kg/km2,精度明显优于传统机器学习算法,与CNN-LSTM-Attention模型相比,R2提高了6.69%,RMSE与MAE分别降低了18.28%和15.51%。消融试验表明,CBAM可以有效抑制背景噪音的影响,增强模型对冬小麦种植区域的关注。CCAB和DGFM则通过早期特征融合与动态调整,从而提升预测精度。模型表现出良好的早期预测能力,使用播种至孕穗期数据进行预测的RMSE为531.95 kg/km2,相较于使用全生育期数据低3.6%,表明模型可以在冬小麦成熟前50 天左右提供准确预测,数据贡献度分析表明,地表反射率数据是影响预测精度的关键因子。此外,通过逐年验证证明了模型具有良好的年份鲁棒性。该研究为构建准确、可靠的作物产量预测系统提供了新的思路。

       

      Abstract: Accurate and efficient prediction is often required for the large-scale yield of winter wheat in modern agriculture. In this study, a deep learning model was proposed to predict the winter wheat yield at the county level using multimodal data fusion. The wide-scale data was also acquired from the surface reflectance and the Global Land Data Assimilation System (GLDAS). Multi-source data was effectively integrated to enhance the accuracy of the prediction, including surface reflectance (seven bands, such as red, blue, and short-wave infrared), meteorological data (eight bands, such as air temperature, near-surface wind speed, precipitation, and snow water equivalent), soil data (four bands, such as soil temperature and soil moisture), and historical yield data. Four modules consisted of: A multimodal feature extraction encoder, a historical yield temporal feature extraction, a dynamic gated feature fusion module (DGFM), and a yield prediction module. Among them, the multimodal feature extraction encoder was to extract the spatiotemporal feature representations from the input surface reflectance data and GLDAS data, respectively. A dual-branch architecture was adopted in the encoder. The encoding process was divided into two stages: the spatial and the temporal feature modeling. In the spatial feature modeling, the ResNet was used as the backbone network, with its fully connected layers at the end removed to facilitate feature extraction. A Convolutional Block Attention Module (CBAM) was embedded after each residual block, in order to suppress the interference from background regions in the remote sensing images. The key regions and channels were focused after enhancement. In the temporal feature modeling, the feature vectors from the previous stage were reorganized sequentially by time step, and then the positional encoding was introduced to preserve their temporal relationships. A five-layer cross-branch cross-attention block was used to capture the temporal dependencies at the key growth stages of winter wheat. A Gated Recurrent Unit (GRU) network was incorporated to extract the temporal features from historical yields. A dynamic gated fusion module (DGFM) was then adaptively fused with the multi-source features, which more accurately reflected the long-term trends and inter-annual fluctuations. In the data, a dataset was constructed to evaluate the model's performance in the main winter wheat production areas of northern China, including Shanxi Province (Linfen, Yuncheng, Jincheng, Changzhi, and Jinzhong cities), Henan Province (all regions), Shaanxi Province (Guanzhong region), and Gansu Province (Tianshui, Pingliang, Qingyang, and Longnan cities). Experimental results show that the coefficient of determination (R2) was 0.909, and the root mean squared error (RMSE) and mean absolute error (MAE) were 512.97 and 377.08 kg/km2, respectively. The accuracy was significantly better than the conventional machine learning. Compared with the high-performing CNN-LSTM-Attention model, the model increased the R2 by 6.69%, while decreasing RMSE and MAE by 18.28% and 15.51%, respectively. Ablation test results show that the RMSE increased to 601.54 kg/km2 after removing the CBAM module. After removing the DGFM and CCAB modules, the RMSE increased by 75.21 and 80.18 kg/km2, respectively, the MAE increased by 9.41 and 14.37 kg/km2, respectively, and the R2 decreased by 0.038 and 0.033, respectively. All modules greatly contributed to the accuracy of early prediction. The RMSE value of 531.95 kg/km2 was found for the predictions using data from sowing to heading stage, which was 3.6% lower than that from the full growth period. It infers that the accurate prediction was approximately 50 days before the winter wheat harvest. The contribution analysis of different data sources showed that the surface reflectance data shared a high contribution to the prediction of winter wheat yield, with the RMSE, MAE, and R2 values of 623.15 kg/km2, 476.52 kg/km2, and 0.778, respectively. As such, the surface reflectance data can more directly represent the crop growth status. Model robustness results show that the average RMSE and MAE were 561.34 kg/km2 and 398.31 kg/km2, respectively, and the average R2 was 0.876 in the annual validation from 2015 to 2021. The prediction performance fluctuated less between different years, indicating the inter-annual robustness. The high accuracy and robustness were then achieved in the prediction of the winter wheat yield. The finding can also provide a promising approach for an accurate and reliable crop yield forecasting.

       

    /

    返回文章
    返回