Abstract:
oil moisture is one of the most critical hydrologic indicators in the land-atmosphere heat exchange and global climate dynamics. The high-resolution products of soil moisture are greatly contributed to the precise monitoring of agricultural droughts. However, the existing datasets of soil moisture are limited to the coarse spatial resolution (typically >9 km) and temporal discontinuity. In this study, a high-resolution soil moisture simulation (HRSMS) framework was developed to incorporate an ensemble learning approach, particularly for multisource data fusion. Spatially continuous estimates of soil moisture were then captured at 1 km resolution with temporal consistency. The accuracy of estimation was improved significantly, compared with the conventional approaches. Three computational procedures are included in the framework. Firstly, the high-resolution ancillary datasets (e.g., vegetation indices and land surface temperature) were spatiotemporally reconstructed using Savitzky-Golay filtering with multivariate regression. Data gaps were also determined to preserve the temporal dynamics. Secondly, the spatial downscaling was performed on the soil moisture active passive (SMAP) observations (2017-2022, 0~5 cm depth) from 9 km to 1 km resolution. A systematic investigation was also made to clarify the synergistic relationships among vegetation indices, land surface temperature, soil properties, and topographic parameters. In situ measurements were then implemented using ensemble machine learning, including random forest (RF) and gradient boosting machine (GBM). Thirdly, the multi-scale assessments were selected to compare with the original moderate resolution imaging spectroradiometer land surface temperature (MODIS LST) products. The point-scale evaluation of in-situ networks was also carried out in Jilin Province, China. A systematic quantification was then performed on the computational efficiency and accuracy metrics, including the root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (
r2). Finally, the polynomial regression fitting (PRF) was utilized to validate the hrsms model on three critical maize growth days (16 June 2018, 15 August 2019, and 11 July 2020). The results showed that: 1) The high performance was achieved in reconstructing the land surface temperature, with the rmse, mae, and
r2 values of 0.526 k, 0.338 k, and 0.986, respectively, compared with the original modis lst. Three sites were randomly selected to evaluate the performance of the hrsms model in both temporal and spatial dimensions. The gbm algorithm marginally outperformed the RF, with an average
r2 improvement of 1.3%, together with the superior MAE (0.447) and RMSE (0.400). 2) The RF algorithm was achieved in the MAE, RMSE, and
r2 values of 0.033 m
3/m
3, 0.049 m
3/m
3, and 0.574, respectively, over three days. The GBM algorithm also yielded comparable metrics (MAE: 0.033 m
3/m
3; RMSE: 0.050 m
3/m
3; and
r2: 0.556). The better performance was also achieved with an average
r2 improvement of 0.34, compared with the discontinuous inputs using RF and GBM algorithms. 3) The hrsms model significantly improved the accuracy of soil moisture simulation, compared with the PRF. Specifically, the PRF exhibited inferior performance, with the MAE and RMSE exceeding 0.047 m
3/m
3 and
r2 below 0.35. The improved model was realized to solve the PRF overestimation of soil moisture in northwest Jilin Province. Notably, the soil moisture of PRF was severely overestimated in the northwestern grain-producing regions by 36.4% (RF) and 36.0% (GBM), respectively, compared with the ground measurements. By contrast, the hrsms reduced the rmse, mae, and r2 errors by 22.2%, 44.0%, and 0.27, respectively, compared with the PRF, indicating a 33.2% error reduction in the critical agricultural zones. 4) The RF and GBM demonstrated similar efficacy, with the rf marginally outperforming GBM in
r2 (4.9% higher), RMSE (0.7% lower), and MAE (2.3% lower). The comparable performance between the RF and GBM algorithms after computation. As such, both improved models were equivalently deployed to implement the regional-scale simulation with operational flexibility. The hrsms framework successfully enhanced the spatial resolution and accuracy of soil moisture products, particularly with the temporal continuity. Multisource data and ensemble learning were integrated to solve the overestimation in the traditional models, suitable for the agriculturally vital regions. The operational adaptability of RF and GBM algorithms can be expected to tailor the applications to diverse data environments. The improved model also shared the significant potential for regional scalability, particularly in the necessitating areas for the high-resolution monitoring of soil moisture. The robustness and generalizability can be enhanced to validate the diverse geographical regions and climatic conditions. The complementary environmental variables (e.g., evapotranspiration) can also be integrated into future research. The findings can substantially contribute to the precision agriculture practices and climate resilience