基于BO-XGBoost的三维特征空间潮土盐渍化估算方法

    Estimation method of tidal soil salinization using a three-dimensional feature space based on BO-XGBoost

    • 摘要: 黄河三角洲地区盐渍化程度较为严重,相较于二维特征空间,三维特征空间能更充分地利用遥感影像的多波段信息,具有更强的监测能力。然而,目前基于三维特征空间,结合贝叶斯优化(Bayesian optimization,BO)的XGBoost (extreme gradient boosting)模型实现特征优选并构建三维特征空间以实现黄河三角洲地区盐渍化监测的研究相对较少。为此,该研究基于Landsat 9遥感影像,提取了涵盖植被指数、盐分指数、水体指数等光谱指数共计35个,利用贝叶斯优化后的XGBoost模型对不同类别指数进行特征重要性评估与优选,基于优选出的代表性指数,跨类别组合构建了多个三维特征空间模型,结合野外实测数据,通过精度评价指标对比分析,筛选得到黄河三角洲地区最优的三维特征空间盐渍化反演模型,并基于最优模型,对黄河三角洲地区盐渍化进行空间分析。结果表明:1)基于贝叶斯优化的XGBoost模型能够实现指数的筛选,盐分指数7(salinity index 7, SI7)的特征重要性最高,为0.341,且 R 2、RMSE及RPIQ分别达0.921、0.964和8.422,最终筛选出8个指数用于三维特征空间的构建;2)相较于二维特征空间,三维特征空间能够充分利用光谱信息,最优模型 R 2、RPIQ分别升高0.059和1.191,RMSE降低0.069 g/kg,从而实现对土壤盐渍化的高精度预测;3)基于SI8-Albedo-WI所构建的特征空间监测模型精度最高, R 2、RMSE和RPIQ分别为0.922、0.863 g/kg和7.645,其分类Kappa系数达86%,ERVI-WI-Albedo表现效果最差, R 2、RMSE和RPIQ分别为0.519、3.464 g/kg和1.087;4)在黄河三角洲地区,中度盐渍化面积占比最高,为29.7%,分布在利津县、垦利区中西部等地区,重度盐渍化面积占比最低,为9.8%,主要分布在垦利区东部等地区。研究结果可为黄河三角洲土壤盐渍化防治与改良提供重要的决策与支撑。

       

      Abstract: Under the global context of environmental and climate change, soil salinization has become a critical issue threatening ecological stability and sustainable development. The Yellow River Delta region exhibits prominent soil salinization problems. Compared to arid areas, this region features complex ecological heterogeneity, where traditional two-dimensional feature spaces—limited to coupling only two environmental variables—show inherent limitations in characterizing such complex environments. Furthermore, conventional research often restricts itself to a few specific indices, failing to systematically construct and evaluate a comprehensive index pool to identify the optimal parameters for representing salinization, which limits the full extraction of salinization information. In contrast, three-dimensional feature spaces can more effectively utilize multi-band remote sensing data and integrate multiple environmental variables, thereby possessing stronger monitoring capabilities. However, studies focusing on salinization monitoring in the Yellow River Delta by constructing three-dimensional feature spaces based on spectral indices, combined with feature selection and Bayesian-optimized XGBoost models, remain relatively scarce. To address this, this study employed Landsat 9 satellite imagery to build a spectral index pool, extracting a total of 35 spectral indices, including vegetation indices, salinity indices, water indices, and other relevant indices. To fully enhance modeling efficiency and parameter screening effectiveness, a Bayesian-optimized XGBoost model was utilized to evaluate and select features based on the built-in Gain metric, retaining the top two most important indices from each category. Using these selected representative indices, multiple three-dimensional feature space models were constructed through cross-category combinations. Within these three-dimensional spaces, the three coordinate axes respectively represent different index types, and any point ( x, y, z) in the feature space corresponds to the values of three indices for a specific pixel in the remote sensing image. Simultaneously, multiple two-dimensional feature spaces were built using cross-category combinations of the single most important index from each category. By comparing accuracy evaluation metrics with field-measured data, the optimal three-dimensional and two-dimensional feature space models for soil salinization inversion in the Yellow River Delta were determined, and regional salinization spatial analysis was subsequently conducted. The results demonstrate that: 1) The Bayesian-optimized XGBoost model effectively screened the most relevant indices. Salinity indices achieved the highest modeling accuracy ( R 2 = 0.921, RMSE = 0.964 g/kg, RPIQ = 8.422), with the salinity index 7 SI7 showing the highest feature importance (0.341). Ultimately, eight of the most informative feature indices were selected. 2) Compared to two-dimensional feature spaces, three-dimensional feature spaces more fully exploit spectral information. The optimal three-dimensional model showed improvements of 0.059 in R 2 and 1.191 in RPIQ, and a reduction of 0.069 g/kg in RMSE, confirming that the three-dimensional approach enables high-precision prediction of soil salinization. 3) Among the constructed three-dimensional feature space models, the model based on SI8-Albedo-WI achieved the highest accuracy ( R 2 = 0.922, RMSE = 0.863 g/kg, RPIQ = 7.645, Kappa coefficient = 86%), whereas the ERVI-WI-Albedo model performed the worst ( R 2 = 0.519, RMSE = 3.464 g/kg, RPIQ = 1.087). 4) In the Yellow River Delta region, moderately salinized areas account for the largest proportion (29.7%), primarily distributed in the central-western part of Kenli District and Lijin County; severely salinized areas constitute the smallest proportion (9.8%), mainly located in the eastern part of Kenli District. The findings of this study provide crucial references and decision-making support for the prevention and remediation of soil salinization in the Yellow River Delta.

       

    /

    返回文章
    返回