Guo Jing, Long Huiling, He Jin, Mei Xin, Yang Guijun. Predicting soil organic matter contents in cultivated land using Google Earth Engine and machine learning[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(18): 130-137. DOI: 10.11975/j.issn.1002-6819.2022.18.014
    Citation: Guo Jing, Long Huiling, He Jin, Mei Xin, Yang Guijun. Predicting soil organic matter contents in cultivated land using Google Earth Engine and machine learning[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(18): 130-137. DOI: 10.11975/j.issn.1002-6819.2022.18.014

    Predicting soil organic matter contents in cultivated land using Google Earth Engine and machine learning

    • Abstract: Soil Organic Matter (SOM) is the carrier of soil organic carbon in the crop system. This component of soil solid matter is one of the core elements to evaluate soil fertility quality in agriculture and land management. An accurate and efficient acquisition of SOM content can greatly contribute to the quality grading of cultivated land. High-resolution remote sensing and Google Earth Engine (GEE) can serve as the computing platform for the efficient inversion of SOM. Much effort has been made on the SOM prediction model and the spatial distribution map. However, it is still lacking in the appropriate satellite data sources and prediction algorithms to accurately predict the SOM content in specific regions. In this study, the accurate SOC content was predicted in the cultivated land using GEE and machine learning. The Sentinel-2A MSI and the Landsat8 OLI data were collected in the Gaocheng District, Shijiazhuang City, Hebei Province, China. The main data sources were also combined with the Sentinel-1 SAR, ECMWF/ERA5 meteorological, and USGS/SRTMGL1_003 elevation data. The variable feature sets of the spectral band were constructed, including the vegetable index (Normalized Difference Vegetation Index (NDVI);Red Index (RI);Enhanced Vegetable Index (EVI);Soil-Adjusted Total Vegetation Index (SATVI);Brightness Index (BI)), radar feature (Sentinel-1 VV, and Sentinel-1 VH), terrain feature (slope, aspect, and elevation), and climate feature (annual precipitation, and average annual temperature). Six and five models were constructed using the Sentinel-2 and Landsat8 variable datasets, respectively. Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Support Vector Machine (SVM) were utilized to predict the SOM on the GEE platform. The predictive performances of three machine learning methods were determined for a high-precision spatial distribution map for the SOM inversion. The accuracy of the prediction model was then evaluated using the determination coefficient (R2) and the root mean square error (RMSE). The results show that: 1) the R2 and RMSE values of the model using the Sentinel-2A were better than those using the Landsat8. The Sentinel-2A model performed better than the Landsat8 model in the predicting SOM content. The best performance (R2=0.759, RMSE=2.852 g/kg) was achieved in the omnivariate model of Sentinel-2A under the GBDT. 2) The maximum improvement of 9.752% was obtained in model A-1 with the red edge band, compared with model A-0. This difference was attributed to the inclusion of four red edge bands (B5, B6, B7, and B8A) in model A-1. The addition of red edge bands greatly improved the prediction accuracy of the model, particularly with the effective spectral information for the SOM inversion. 3) The red edge band, vegetable index, Sentinel-1A radar features, terrain factors, and climate variables greatly contributed to the prediction accuracy of SOM from the perspective of different variable feature combinations. 4) The GBDT was better applied to the SOM prediction in the study area. The resultant SOM map was used to accurately characterize the SOM spatial distribution. The test data was verified for high accuracy, each group of which was an excellent consistency in the image, indicating the reliable SOM inversion. Therefore, the Sentinel-2A MSI data presented outstanding advantages over the Landsat8 OLI, due to the higher spectral and spatial resolutions. The combination of GBDT, Sentinel-2A, and GEE can be an effective way to predict the SOM map. Each prediction factor can also provide valuable information for the prediction of SOM content.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return