基于多源数据的南方丘陵山地土地利用随机森林分类

李恒凯; 王利娟; 肖松松

doi:10.11975/j.issn.1002-6819.2021.07.030

摘要: 针对南方丘陵山地因地形破碎和山体阴影而导致的分类精度低问题，该研究以东江源地区为例，通过结合多源数据，以Sentinel-1、Sentinel-2A卫星影像和DEM作为数据源提取27个指标，构建了6种特征变量集，并设计了9种方案，探讨加入红边特征、雷达特征和地形特征对南方丘陵山地土地利用分类信息提取的作用。同时结合随机森林算法和递归特征消除法进行特征变量优选和特征重要性排序，将随机森林特征优选后的分类结果与支持向量机算法（Support Vector Machine，SVM）和K近邻算法（K-Nearest Neighbor，KNN）作对比。结果表明：在未进行特征变量优选时，仅使用Sentinel-2A的光谱特征提取的东江源地表覆盖分类总体精度和Kappa系数最低，在以光谱特征、植被指数和水体指数作为基本方案时，加入红边特征、雷达特征和地形特征后均可以有效地提升各地物分类精度，其中地形特征的加入更有助于对东江源园地和耕地信息的提取。通过结合随机森林和递归特征消除算法进行特征优选，在保持分类精度最优的情况下将所有特征变量从21个降低到13个，并且总体精度达到0.937 2，Kappa系数达到0.923 4，分类精度优于相同特征下的支持向量机算法（SVM）和K近邻算法（KNN），对东江源土地利用信息提取效果最佳。该研究提出基于多源数据的随机森林方法可为地形复杂的南方丘陵山地土地利用信息提取提供技术支持和理论参考。

Abstract: Land use has been critical to global environmental change and structure adjustment, particularly to the sustainable development of land resources. However, there are complex terrains, broken distribution of ground objects, as well as the cloudy and rainy weather in hilly and mountainous areas of southern China. High-resolution optical remote sensing data is still lacking for the effective and accurate extraction of land use information. Therefore, the use of multi-source remote sensing data can achieve complementary advantages between remote sensing data and classification accuracy. The Sentinel series of remote sensing satellites launched by the European Space Agency (ESA) can provide new data sources for land-use change research. Multi-dimensional features can be adopted for the land use classification using the Sentinel-2A with red edge characteristics and Sentinel-1 with the nearly fog-free performance. Taking the reaches of Dongjiang River in Jiangxi Province of China as the study area, 9 schemes were designed in the Random Forest (RF) classification of land use to explore the effect of red edge, radar and terrain features on the extracting accuracy in hilly and mountainous areas of South China. In this study, the satellite images from the Sentinel-1, Sentinel-2 and digital elevation model (DEM) were combined to extract 27 feature indices, and then to construct 6 feature variable sets. The RF and Recursive Feature Elimination (RFE) were coupled to rank the importance of feature variables for the optimal one. The classification data from the RF feature selection was compared with the Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results showed that the Sentinel-2A spectral features extraction presented the lowest overall accuracy and Kappa coefficient of land use classification in the study area, when the feature variables were not optimized. The addition of red edge, radar and topographic features effectively improved the classification accuracy, when the spectral features, vegetation and water indices were taken as basic schemes. Specifically, the overall accuracy increased by 0.77, 1.79, and 4.27 percentage points, respectively, while, the Kappa coefficient increased by 0.94, 2.18, and 5.2 percentage points, respectively. The topographic features more contributed to the extraction of orchard and cultivated land information in the study area. The RF and recursive feature elimination were combined to optimize all the feature variables from 21 to 13 with an overall accuracy of 0.937 2 and Kappa coefficient of 0.923 4, while maintaining the optimal classification accuracy. There were relatively significant contribution rates of spectral and red edge features variables, which were26.09% and 23.55%, respectively. The vegetation and topographic indices were then followed in the importance of feature variables. The RF classification depended mainly on the short infrared band of B12, Relative Normalized Difference Vegetation Index (RNDVI) and Ratio Vegetation Index (RVI).The overall accuracy of RF was 0.937 2, 5.75% and 6.6% higher than that of SVM and KNN, respectively, whereas, Kappa coefficient was 0.923 4, 7.1% and 8.15% higher than SVM and KNN, respectively, indicating that the RF classification accuracy was superior to SVM and KNN with the same features. Therefore, the RF classification using the multi-source data can provide a promising technical support and theoretical reference for the extraction of land use in the hilly and mountainous regions of South China.

基于多源数据的南方丘陵山地土地利用随机森林分类

Random forest classification of land use in hilly and mountaineous areas of southern China using multi-source remote sensing data