融合机器学习和生态约束的森林生态系统蒸散发组分分割

    Partition of evapotranspiration components in forest ecosystems by integrating machine learning and ecological constraints

    • 摘要: 鉴于精准分割蒸散发组分对揭示植被水分利用机制、优化碳水循环模拟及应对气候变化具有必要性,为系统评估机器学习模型在蒸散发组分分割中的有效性和精度,揭示关键环境驱动因子的重要程度,采用机器学习方法,结合生态学研究,构建整合满足总初级生产力(gross primary productivity,GPP)和保守表面湿度指数(conservative surface wetness index,CSWI)等多约束条件的数据集。采用TEA算法(基于随机森林的算法)、XGBoost(extreme gradient boosting)、LightGBM(light gradient boosting machine)、SVM(support vector machine)预测水分利用效率(water use efficiency,WUE),估计每个时间步长的蒸腾量,从而进行蒸散发分割,在3个碳水耦合模型CASTANEA、JSBach、MuSICA上交叉对比。结果表明:在3个碳水耦合模型中,XGBoost相较于TEA算法表现出显著优势,均方根误差(root mean square error,RMSE)的平均降幅达18%(P<0.001),调整后的决定系数值升高,大部分站点高于0.85(P<0.001),整体统计学显著性检验(P<0.001)证实其在降低预测误差和提高拟合优度方面的优越性,并且参数优化进一步表明,当CSWI取值为﹣0.5mm时,可保证训练数据集的完备性,从而使XGBoost模型达到最优分割效果;相比之下,LightGBM和SVM在大部分站点的蒸散发分割效果上不如XGBoost和TEA算法,RMSE相较TEA算法有明显上升。XGBoost在森林生态系统蒸散发分割中表现出显著优势,其树结构的分层特征交互机制和正则化约束可有效解析碳水耦合过程中的非线性动力学特征,LightGBM和SVM因特征表达机制局限和核函数适应性不足,在复杂生态场景中适用性有限。该研究成果为植被耗水动态精准分割量化提供了方法验证和技术途径。

       

      Abstract: Vegetation water use is often required to precisely divide the evapotranspiration (ET) components, and then optimize the carbon-water cycle response to climate change. However, conventional classification cannot fully meet the large-scale production in recent years, due to the complex parameterization or costly isotope techniques. It is difficult to capture the highly nonlinear interactions between environmental drivers and physiological responses. Therefore, this study aims to systematically evaluate the effectiveness and accuracy of the machine learning (ML) models in this partitioning task. The ecological research was also combined to determine the key environmental drivers. A dataset was constructed to fully meet the multiple constraints, such as the total primary productivity (GPP) and conservative surface moisture index (CSWI). Water resource utilization efficiency (WUE) was then predicted using the classic TEA algorithm (using random forest), XGBoost (extreme gradient enhancement), LightGBM (light gradient enhancement machine), and SVM (support vector machine). Then, the transpiration (T) was estimated at each time step. The T outputs of the three carbohydrate-coupled models (CASTANEA, JSBach, and MuSICA) were compared to evaluate the data-driven models. The results show that the machine learning model was effectively estimated T, among which the XGBoost exhibited the better performance over the TEA algorithm among all three carbon-water coupling models. The average reduction in root mean square error (RMSE) reached 18% (P<0.001), and the adjusted coefficient of determination value increased, with most sites higher than 0.85 (P<0.001). The statistical significance test(P<0.001) confirmed that the prediction errors were reduced for the high goodness-of-fit. Furthermore, the CSWI threshold of -0.5mm significantly enhanced the training data and the stability of the model after hyperparameter optimization, especially under the complex, high-latitude, and multi-layered vegetation canopy structures. In contrast, the XGBoost and TEA algorithms were superior to the LightGBM and SVM at most sites, in terms of the T estimation. Especially, LightGBm and SVM were also limited in the complex ecological scenarios, such as the high-dimensional feature spaces, due to the feature expression and the insufficient adaptability of the kernel function. The hierarchical feature interaction with the boost and robust constraints can be expected to effectively analyze the inherent nonlinear dynamics in the carbon-water coupling, indicating the significant advantages of the XGBoost. This finding can provide a powerful verification and technical approach for the dynamic, precise, and mechanical information quantification of the vegetation water consumption. The regional and global climate models can be parameterized to enhance the prediction for the ecosystem responses under future climate scenarios in a sustainable forest.

       

    /

    返回文章
    返回