自寻优最近邻算法估算有限气象数据区潜在蒸散量

冯克鹏; 田军仓; 洪 阳

doi:10.11975/j.issn.1002-6819.2019.20.010

摘要: FAO-56 Penman-Monteith估算ET0方法被广泛使用，但计算时需要输入多个气象数据。开发一种替代方法，在使用尽可能少的气象数据情况下，仍可以提供准确的或至少接近FAO-56 Penman-Monteith的ET0估算值是该领域研究热点之一。该文结合典型相关分析（canonical correlation analysis，CCA）和k最近邻算法（k-nearest neighbor，k-NN），提出自寻优最近邻算法的潜在蒸散量计算方法（CCA-k-NN），利用较少气象数据实现潜在蒸散量的估算。核心思想是用CCA算法寻找与潜在蒸散量最相关的气象数据，实现后续估算ET0时的气象数据降维，然后利用k-NN算法估算ET0。选择西北地区为例，将该区域气象数据分别从时间和空间尺度，分为训练数据集，验证数据集和测试数据集，分别在3类数据集上用该文方法估算ET0，并以FAO-56 Penman-Monteith作为参照，评估了该文CCA-k-NN方法的估算精度和适用性。结果表明，CCA-k-NN方法与FAO-56 Penman-Monteith保持了较高的相关性（相关系数大于0.9），有好的估算精度，均方根误差和平均绝对误差均小于1 mm/d，空间尺度上算法纳什效率系数均大于0.5，时间尺度上纳什效率系数均大于0.8，在时空尺度均适用。同时，相对于其他替代方法该文算法具有低的时间复杂度，在计算大量数据时可有效降低时间成本。

Abstract: The FAO-56 Penman-Monteith method for estimating potential evapotranspiration is widely used, but multiple meteorological data are required. In this study, the potential evapotranspiration calculation method (CCA-k-NN) of self-optimizing nearest neighbor algorithm combining the canonical correlation analysis algorithm and the k-nearest neighbor algorithm was proposed to estimate potential evapotranspiration by using less meteorological data. This study chose the northwest China as a case. In this area, the arid, semi-arid and semi-humid climates coexist, and the topography of the mountains, Gobi, oasis, and desert are intertwined, it is ecologically fragile, and highly sensitive to climate change. Meteorological data included daily average wind speed, daily average maximum temperature, daily average minimum temperature, daily average temperature, sunshine hours, daily average relative humidity of 148 meteorological stations. They were divided into training datasets, verification datasets and test datasets. On the spatial scale, 60% of all 148 meteorological sites (89 sites) were used as training data sets, 30% of sites were used as verification data sets (44 sites) and the remaining 10% of sites (15 sites) as the test dataset. On the time scale, the data of 1960-2018, the first 60% of the period (1960-1994) was as the training data set, the middle 30% of the year (1995-2012) was as the verification data set and the remaining 10% of the year (2013-2018) was as a test data set. For the training sample dataset, the most relevant meteorological elements in Northwest China with potential evapotranspiration were the highest temperature and relative humidity using typical correlation algorithms. Then, the highest temperature and relative humidity were used as input for the model. The optimal k value was selected by iteration and the results showed that the k value (15-32) of each weather station in northwestern China was suitable. Then, the verification data set and the test data set were respectively input with the highest temperature and relative humidity and the k nearest neighbor algorithm was used for potential evapotranspiration estimation. Models were evaluated by using relative deviation, root mean square error, mean absolute error, correlation coefficient and Nash-Sutcliffe efficiency coefficient. The results showed that the CCA-k-NN method maintained a high correlation with the FAO-56 Penman-Monteith (correlation coefficient greater than 0.9), with good estimation accuracy, and the root mean square error and the mean absolute error were less than 1 mm/d. On the spatial scale, the Nash efficiency coefficient of the algorithm was greater than 0.5, and the Nash efficiency coefficient on the time scale was greater than 0.8, which was applicable at both space and time scales. At the same time, the algorithm had low time complexity compared to other alternative methods, and could effectively reduce the time cost when calculating large amounts of data.

自寻优最近邻算法估算有限气象数据区潜在蒸散量

Method for estimating potential evapotranspiration by self-optimizing nearest neighbor algorithm