Abstract:
Vegetation water use is often required to precisely divide the evapotranspiration (ET) components, and then optimize the carbon-water cycle response to climate change. However, conventional classification cannot fully meet the large-scale production in recent years, due to the complex parameterization or costly isotope techniques. It is difficult to capture the highly nonlinear interactions between environmental drivers and physiological responses. Therefore, this study aims to systematically evaluate the effectiveness and accuracy of the machine learning (ML) models in this partitioning task. The ecological research was also combined to determine the key environmental drivers. A dataset was constructed to fully meet the multiple constraints, such as the total primary productivity (GPP) and conservative surface moisture index (CSWI). Water resource utilization efficiency (WUE) was then predicted using the classic TEA algorithm (using random forest), XGBoost (extreme gradient enhancement), LightGBM (light gradient enhancement machine), and SVM (support vector machine). Then, the transpiration (T) was estimated at each time step. The T outputs of the three carbohydrate-coupled models (CASTANEA, JSBach, and MuSICA) were compared to evaluate the data-driven models. The results show that the machine learning model was effectively estimated T, among which the XGBoost exhibited the better performance over the TEA algorithm among all three carbon-water coupling models. The average reduction in root mean square error (RMSE) reached 18% (
P<0.001), and the adjusted coefficient of determination value increased, with most sites higher than 0.85 (
P<0.001). The statistical significance test(
P<0.001) confirmed that the prediction errors were reduced for the high goodness-of-fit. Furthermore, the CSWI threshold of -0.5mm significantly enhanced the training data and the stability of the model after hyperparameter optimization, especially under the complex, high-latitude, and multi-layered vegetation canopy structures. In contrast, the XGBoost and TEA algorithms were superior to the LightGBM and SVM at most sites, in terms of the T estimation. Especially, LightGBm and SVM were also limited in the complex ecological scenarios, such as the high-dimensional feature spaces, due to the feature expression and the insufficient adaptability of the kernel function. The hierarchical feature interaction with the boost and robust constraints can be expected to effectively analyze the inherent nonlinear dynamics in the carbon-water coupling, indicating the significant advantages of the XGBoost. This finding can provide a powerful verification and technical approach for the dynamic, precise, and mechanical information quantification of the vegetation water consumption. The regional and global climate models can be parameterized to enhance the prediction for the ecosystem responses under future climate scenarios in a sustainable forest.