Zhou Yanan, Chen Hui, Liu Hongbin. Land cover classification in hilly and mountainous areas using multi-source data and Stacking-SHAP technique[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(23): 213-222. DOI: 10.11975/j.issn.1002-6819.2022.23.023
    Citation: Zhou Yanan, Chen Hui, Liu Hongbin. Land cover classification in hilly and mountainous areas using multi-source data and Stacking-SHAP technique[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(23): 213-222. DOI: 10.11975/j.issn.1002-6819.2022.23.023

    Land cover classification in hilly and mountainous areas using multi-source data and Stacking-SHAP technique

    • An accurate classification of land cover can greatly contribute to the basic dataset for regional ecological protection and environmental management. Remote sensing (RS) images are commonly used as the main data source for the extraction of land cover at present. However, there is a complex landscape, broken distribution of ground objects, frequent cloud cover, as well as serious radiometric distortion in the hilly and mountainous areas. Thus, it is difficult to accurately gain the distribution information of ground objects only by satellite images. Fortunately, the collaborative application of multi-source heterogeneous data can be expected to bridge the deficiency of a single data source, in order to accumulate more valuable information for the separability of ground objects. Great prospects can be realized to extract the land cover in areas with the complex surface landscape. In addition, the stacking algorithm with advanced machine learning can present superior and robust predictive performance in recent classification tasks. Therefore, the purpose of the current study is to explore the effectiveness of the multi-source heterogeneous data and stacking algorithm on land cover classification in hilly and mountainous areas. The study area was taken as the Qian Jiang District in Chongqing Province of China. Specifically, the various feature variables were extracted from the multi-source heterogeneous data, including the Sentinel-1/2 images, Digital Elevation Model (DEM), soil and climate data. Boruta method and Variance Inflation Factor (VIF) were applied to eliminate the redundant feature for the simple statistics. Then, five schemes with different inputs were created using the subset of the optimized variables, including the purely RS variables, RS variables plus climate factors, RS variables plus terrain parameters, RS variables plus soil parameters, and all variables. A stacking algorithm was also used to construct the classification model for the impacts of different types of variables on the classification accuracy of land cover. Meanwhile, the best classification using the stacking algorithm was compared with the Support Vector Machine (SVM), Random Forest (RF), and extreme gradient boosting (XGBoost). Additionally, a novel shapley addictive explanation (SHAP) was introduced to quantify the importance of variables in the model. The results showed that the overall accuracy, Kappa coefficient, and F1-score were significantly improved after the introduction of the climate, soil, and terrain variables. By contrast, the lowest classification accuracy of land cover was found in the model only using remote sensing variables. Among them, the soil variables contributed the most improvement, followed by the terrain, and climate variables. The classification accuracy of agricultural land types (dry farmland, paddy field, and orchard) was greater than that of the rest. The best classification accuracy was achieved in the experimental scheme with all feature variables, indicating an overall accuracy of 96.61%, Kappa of 0.96, and F1-score of 94.81%. The classification accuracy of the improved was higher than that of the SVM, RF, and XGBoost under the same variables. The SHAP technique can be expected to quantify and evaluate the global importance of each variable, indicating that the traditional vegetation and water spectral indicators were the most important feature variables. Besides, the local contribution of each variable for each land cover type can provide more value to optimize the parameters for the extraction of object information in hilly and mountainous areas. This finding can offer technical support and theoretical reference for land cover mapping in complex landscape areas.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return