Abstract:
During the past decades, soil hyperspectral reflectance had been showed to be a rapid, convenient, low-cost and alternative method for estimating soil key properties. However, the hyperspectral dataset may have thousands of variables because modern spectroscopy instruments usually had a high resolution. Moreover, the full-spectrum includes many wavelengths which could contribute the collinearity, redundancy and noise to models. Thus, the key variable selection is an important step in soil hyperspectral modeling analysis. The main objectives of this study were to compare the prediction accuracy of different models developed by using several variable selection algorithms for the estimation of soil organic matter (SOM), and the performance of full-spectrum partial least squares regression (PLSR) was also used to compare with. Fifty six soil samples at 0-20 cm depth were collected from Gong’an Countyin Jianghan Plain. The spectral reflectance of soil samples was measured by an ASD FieldSpec3 instrument under the laboratory conditions. Meanwhile, physical and chemical properties of these soil samples were analyzed. Kennard-Stone algorithm was used to divide soil samples into calibration sets with 40 samples and prediction sets with 16 samples. Different spectral pre-processing methods were conducted for raw soil reflectance. Then three variable selection methods such as UVE (uninformative variables elimination), CARS (competitive adaptive reweighted sampling), SPA (successive projections algorithm) were used to select key variables. At last, based on variables selected by different methods, we used partial least squares regression method with full cross validation to build quantitative inversion models for SOM. The prediction accuracies of these optimal models were assessed by comparing determination coefficients (R
2), root mean squared error (RMSE) and relative percent deviation (RPD) between the estimated and measured SOM. The results showed that, the prediction accuracies of different spectral pre-processing methods were significantly diverse, and Savitzky-Golay1st derivative smoothing with nine points (SG +1stD) was the best pre-processing method, from which the improvements of approximately 37.29% in RPD were achieved for SOM prediction. Among the three single variable selection methods, CARS method was superior to other two variable selection methods while retaining good model prediction accuracy with R
2 value of 0.88 and RPD value of 2.84, and it could extract the key variables for SOM effectively. Comparing all variable selection methods comprehensively, the PLSR model built by using CARS-SPA method on 37 characteristic wavelengths selected from full-spectrum of 2001 wavelengths achieved the optimal performance. Its values of determination coefficients R
2 and relative percent deviation (RPD) between the estimated and measured SOM for the predicted model were 0.92 and 3.60, respectively. By using CARS-SPA method, the total number of selected variables was only 1.85% of full-spectrum. And the CARS-SPA-PLSR model was feasible and reliable for estimating the SOM using the hyperspectral reflectance of soil samples under the laboratory conditions. Appropriate variable selection could enhance the performance of a model, simplify the regression models, and increase the accuracies of SOM estimation. In the future, the CARS-SPA-PLSR inversion model can be used as a reference for development of proximal soil sensing devices for this region and online monitoring of SOM.