基于XGBoost-Geoshapley模型的典型黑土区土壤有机碳局部障碍因子识别及优化

    Identification and optimization of local barrier factors for soil organic carbon in typical black soil regions by using XGBoost-Geoshapley model

    • 摘要: 识别并优化东北黑土区土壤有机碳(soil organic carbon,SOC)的局部障碍因子,对于促进土壤固碳减排及保护黑土资源至关重要。然而,现有研究使用Shapley算法进行局部影响因子探究时,未考虑SOC的空间依赖特征,也未能提供障碍因子的定量优化策略,从而影响了结果的精确度和实用性。鉴于此,该研究以黑龙江克东县为例,结合极限提升树模型和Geoshapley算法,识别该地区SOC的局部障碍因子,确定了各障碍因子的优化目标值,并估算了障碍因子优化后SOC的提升潜力。结果表明: 1)极限提升树模型的拟合和预测R2分别为0.99和0.93,且Geoshapley结果显示空间位置因子的重要性排名第二,证实了模型的有效性;2)研究区西部SOC主要受碱解氮和年降雨量偏低,以及速效钾和有效磷过高的制约;东部SOC则因地形起伏大、道路和水体密度低而受限;3)部分依赖图表明各障碍因子与SOC的关系均存在阈值效应,当碱解氮、速效钾、有效磷、到道路和水体距离分别达到300 mg/kg、231 mg/kg、52 mg/kg、8 467 m和180 m时,SOC达到峰值;4)以各障碍因子的最优阈值为优化目标,预计研究区SOC均值将从初始的32.52 g/kg提升至41.62 g/kg,增幅达27.98%。该研究证实了考虑SOC空间依赖性的必要性和Geoshapley算法的优越性,为当地土壤环境及农业措施的优化提供了数据支撑,也为其他地区土壤属性障碍因子的识别与优化研究提供了参考。

       

      Abstract: The identification and optimization of local barrier factors affecting soil organic carbon (SOC) in the black soil region of Northeast China are critical for enhancing soil carbon sequestration, reducing emissions, and protecting black soil resources. However, existing studies employing the Shapley algorithm to identify local influencing factors often neglect the spatial dependency of SOC and fail to provide quantitative optimization strategies for these barriers, which limits the accuracy and practical applicability of their results. To address these gaps, this study takes Kedong County in Heilongjiang Province as a case study and proposes a novel integration of the Extreme Gradient Boosting (XGBoost) model with the Geoshapley algorithm. This approach effectively identifies localized barrier factors influencing SOC, determines optimal target values for each factor, and estimates the potential improvement in SOC levels following optimization. The methodology involved collecting and preprocessing spatial data on SOC and 19 potential influencing factors based on the Scorpan framework, covering soil properties, climate, organisms, topography, and spatial location. The XGBoost model was used to capture complex nonlinear relationships between SOC and environmental variables, while the Geoshapley algorithm was applied to account for spatial dependence and interaction effects, providing more accurate estimates of factor importance and enabling local interpretation of model predictions. The genetic algorithm was used for variable selection to avoid overfitting and reduce dimensionality. Comprehensive results demonstrated that the XGBoost model achieved exceptional performance, with fitting and prediction R2 values of 0.99 and 0.93, respectively, indicating strong explanatory power and generalization capability. The Geoshapley analysis provided several key findings:1) Spatial location was identified as the second most important factor, accounting for substantial variation in SOC distribution, which confirms the necessity of incorporating spatial effects in SOC modeling; 2) The spatial distribution of Geoshapley values for key variables revealed distinct regional patterns, with AN showing higher values in eastern areas, indicating its positive contribution to SOC accumulation in these regions, while AK and AP displayed more negative values in western parts, suggesting their inhibitory effects on SOC in these locations; 3) In western regions, SOC was primarily constrained by low AN and reduced annual rainfall, combined with excessively high AK and elevated AP; 4) Eastern SOC levels were limited by significant topographic relief, coupled with low road density and limited access to water bodies; 5) Partial dependence plots identified clear threshold effects: SOC reached peak values when AN, AK, AP, and the distance to roads and water bodies reached 300 mg/kg, 231 mg/kg, 52 mg/kg, 8 467 m and 180 m, respectively. Implementation of the optimization strategy based on these thresholds is projected to increase mean SOC content across the study area from an initial 32.52 g/kg to 41.62 g/kg—a significant increase of 27.98%. The western regions showed the most substantial potential improvement, with some areas gaining over 15 g/kg, while eastern areas remained stable with only minimal adjustments, indicating region-specific responsiveness to management interventions. This study underscores the critical importance of integrating spatial dependency into SOC modeling and highlights the advantages of the Geoshapley algorithm in improving interpretation accuracy over conventional SHAP methods. The spatial patterns of Geoshapley values provide valuable insights into the region-specific mechanisms governing SOC accumulation. The findings provide actionable insights for tailoring local soil management practices and agricultural strategies, such as site-specific fertilization and improved irrigation infrastructures. Furthermore, the methodology offers a scalable framework for identifying and optimizing barrier factors of soil attributes in other regions, supporting global efforts toward sustainable land use and climate change mitigation. The approach demonstrates how advanced spatial machine learning techniques can bridge the gap between theoretical modeling and practical agricultural management, enabling more precise and effective soil conservation strategies in ecologically vulnerable regions.

       

    /

    返回文章
    返回