Abstract:
The identification and optimization of local barrier factors affecting soil organic carbon (SOC) in the black soil region of Northeast China are critical for enhancing soil carbon sequestration, reducing emissions, and protecting black soil resources. However, existing studies employing the Shapley algorithm to identify local influencing factors often neglect the spatial dependency of SOC and fail to provide quantitative optimization strategies for these barriers, which limits the accuracy and practical applicability of their results. To address these gaps, this study takes Kedong County in Heilongjiang Province as a case study and proposes a novel integration of the Extreme Gradient Boosting (XGBoost) model with the Geoshapley algorithm. This approach effectively identifies localized barrier factors influencing SOC, determines optimal target values for each factor, and estimates the potential improvement in SOC levels following optimization. The methodology involved collecting and preprocessing spatial data on SOC and 19 potential influencing factors based on the Scorpan framework, covering soil properties, climate, organisms, topography, and spatial location. The XGBoost model was used to capture complex nonlinear relationships between SOC and environmental variables, while the Geoshapley algorithm was applied to account for spatial dependence and interaction effects, providing more accurate estimates of factor importance and enabling local interpretation of model predictions. The genetic algorithm was used for variable selection to avoid overfitting and reduce dimensionality. Comprehensive results demonstrated that the XGBoost model achieved exceptional performance, with fitting and prediction
R2 values of 0.99 and 0.93, respectively, indicating strong explanatory power and generalization capability. The Geoshapley analysis provided several key findings:1) Spatial location was identified as the second most important factor, accounting for substantial variation in SOC distribution, which confirms the necessity of incorporating spatial effects in SOC modeling; 2) The spatial distribution of Geoshapley values for key variables revealed distinct regional patterns, with AN showing higher values in eastern areas, indicating its positive contribution to SOC accumulation in these regions, while AK and AP displayed more negative values in western parts, suggesting their inhibitory effects on SOC in these locations; 3) In western regions, SOC was primarily constrained by low AN and reduced annual rainfall, combined with excessively high AK and elevated AP; 4) Eastern SOC levels were limited by significant topographic relief, coupled with low road density and limited access to water bodies; 5) Partial dependence plots identified clear threshold effects: SOC reached peak values when AN, AK, AP, and the distance to roads and water bodies reached 300 mg/kg, 231 mg/kg, 52 mg/kg, 8 467 m and 180 m, respectively. Implementation of the optimization strategy based on these thresholds is projected to increase mean SOC content across the study area from an initial 32.52 g/kg to 41.62 g/kg—a significant increase of 27.98%. The western regions showed the most substantial potential improvement, with some areas gaining over 15 g/kg, while eastern areas remained stable with only minimal adjustments, indicating region-specific responsiveness to management interventions. This study underscores the critical importance of integrating spatial dependency into SOC modeling and highlights the advantages of the Geoshapley algorithm in improving interpretation accuracy over conventional SHAP methods. The spatial patterns of Geoshapley values provide valuable insights into the region-specific mechanisms governing SOC accumulation. The findings provide actionable insights for tailoring local soil management practices and agricultural strategies, such as site-specific fertilization and improved irrigation infrastructures. Furthermore, the methodology offers a scalable framework for identifying and optimizing barrier factors of soil attributes in other regions, supporting global efforts toward sustainable land use and climate change mitigation. The approach demonstrates how advanced spatial machine learning techniques can bridge the gap between theoretical modeling and practical agricultural management, enabling more precise and effective soil conservation strategies in ecologically vulnerable regions.