于雷, 洪永胜, 周勇, 朱强, 徐良, 李冀云, 聂艳. 高光谱估算土壤有机质含量的波长变量筛选方法[J]. 农业工程学报, 2016, 32(13): 95-102. DOI: 10.11975/j.issn.1002-6819.2016.13.014
    引用本文: 于雷, 洪永胜, 周勇, 朱强, 徐良, 李冀云, 聂艳. 高光谱估算土壤有机质含量的波长变量筛选方法[J]. 农业工程学报, 2016, 32(13): 95-102. DOI: 10.11975/j.issn.1002-6819.2016.13.014
    Yu Lei, Hong Yongsheng, Zhou Yong, Zhu Qiang, Xu Liang, Li Jiyun, Nie Yan. Wavelength variable selection methods for estimation of soil organic matter content using hyperspectral technique[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(13): 95-102. DOI: 10.11975/j.issn.1002-6819.2016.13.014
    Citation: Yu Lei, Hong Yongsheng, Zhou Yong, Zhu Qiang, Xu Liang, Li Jiyun, Nie Yan. Wavelength variable selection methods for estimation of soil organic matter content using hyperspectral technique[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(13): 95-102. DOI: 10.11975/j.issn.1002-6819.2016.13.014

    高光谱估算土壤有机质含量的波长变量筛选方法

    Wavelength variable selection methods for estimation of soil organic matter content using hyperspectral technique

    • 摘要: 土壤高光谱数据量大、波段维数高,存在光谱信息无效、冗余和重叠现象,导致基于全波段构建的土壤有机质含量反演模型不稳定、精度难以提升。因此,探寻筛选关键波长变量的方法,通过滤除干扰、冗余、共线信息,提高模型预测性能,是目前土壤高光谱研究的热点之一。该文对江汉平原公安县的土壤样本进行室内理化分析、光谱测量与处理等工作获取了实证数据,采用无信息变量消除法(uninformative variables elimination,UVE)剔除无效变量,利用竞争性自适应重加权算法(competitive adaptive reweighted sampling,CARS)滤除冗余变量,运用连续投影算法(successive projections algorithm,SPA)消除共线变量,并尝试将不同类型的筛选方法进行耦合筛选关键波长变量,应用偏最小二乘回归(partial least squares regression,PLSR)分别建立土壤有机质含量估算模型,对比各种变量筛选方法的优缺点,最终,构建筛选土壤高光谱数据关键变量的方法体系。研究结果表明,除SPA方法的模型精度低于全波段外,其他6种变量筛选方法的建模效果均优于全波段;在3种单个变量筛选方法中,CARS方法优于UVE、SPA变量筛选方法,能有效地筛选出重要波长变量,其预测集相对分析误差RPD值为2.84;综合比较各种变量筛选方法,发现CARS-SPA方法从全波段2 001个波长中筛选出37个特征波长建立的土壤有机质含量的PLSR模型效果最好,其模型预测集的决定系数R2和相对分析误差RPD值分别为0.92、3.60,所选波段仅为全波段的1.85%。CARS-SPA-PLSR模型简单、预测效果好,可作为该区域土壤有机质含量估测的重要方法,对今后土壤近地传感器设备的开发具有一定的指导作用。

       

      Abstract: During the past decades, soil hyperspectral reflectance had been showed to be a rapid, convenient, low-cost and alternative method for estimating soil key properties. However, the hyperspectral dataset may have thousands of variables because modern spectroscopy instruments usually had a high resolution. Moreover, the full-spectrum includes many wavelengths which could contribute the collinearity, redundancy and noise to models. Thus, the key variable selection is an important step in soil hyperspectral modeling analysis. The main objectives of this study were to compare the prediction accuracy of different models developed by using several variable selection algorithms for the estimation of soil organic matter (SOM), and the performance of full-spectrum partial least squares regression (PLSR) was also used to compare with. Fifty six soil samples at 0-20 cm depth were collected from Gong’an Countyin Jianghan Plain. The spectral reflectance of soil samples was measured by an ASD FieldSpec3 instrument under the laboratory conditions. Meanwhile, physical and chemical properties of these soil samples were analyzed. Kennard-Stone algorithm was used to divide soil samples into calibration sets with 40 samples and prediction sets with 16 samples. Different spectral pre-processing methods were conducted for raw soil reflectance. Then three variable selection methods such as UVE (uninformative variables elimination), CARS (competitive adaptive reweighted sampling), SPA (successive projections algorithm) were used to select key variables. At last, based on variables selected by different methods, we used partial least squares regression method with full cross validation to build quantitative inversion models for SOM. The prediction accuracies of these optimal models were assessed by comparing determination coefficients (R2), root mean squared error (RMSE) and relative percent deviation (RPD) between the estimated and measured SOM. The results showed that, the prediction accuracies of different spectral pre-processing methods were significantly diverse, and Savitzky-Golay1st derivative smoothing with nine points (SG +1stD) was the best pre-processing method, from which the improvements of approximately 37.29% in RPD were achieved for SOM prediction. Among the three single variable selection methods, CARS method was superior to other two variable selection methods while retaining good model prediction accuracy with R2 value of 0.88 and RPD value of 2.84, and it could extract the key variables for SOM effectively. Comparing all variable selection methods comprehensively, the PLSR model built by using CARS-SPA method on 37 characteristic wavelengths selected from full-spectrum of 2001 wavelengths achieved the optimal performance. Its values of determination coefficients R2 and relative percent deviation (RPD) between the estimated and measured SOM for the predicted model were 0.92 and 3.60, respectively. By using CARS-SPA method, the total number of selected variables was only 1.85% of full-spectrum. And the CARS-SPA-PLSR model was feasible and reliable for estimating the SOM using the hyperspectral reflectance of soil samples under the laboratory conditions. Appropriate variable selection could enhance the performance of a model, simplify the regression models, and increase the accuracies of SOM estimation. In the future, the CARS-SPA-PLSR inversion model can be used as a reference for development of proximal soil sensing devices for this region and online monitoring of SOM.

       

    /

    返回文章
    返回