样本选择和应用策略对机器学习作物识别精度影响

    Impact of sample selection and application strategies on crop identification accuracy using machine learning

    • 摘要: 样本数量与质量是影响遥感作物分类精度的关键因素之一,合理的样本数量及其跨区域迁移关系到实地采样工作量。为探究训练样本最小使用量和样本迁移使用效果,该研究以新疆若羌河、瓦石峡河及塔什萨依河灌区为研究区,提出样本像素密度概念(样本对应的像素占分类区所有像素的比例),构建单灌区样本本地应用、单灌区样本迁移和双灌区样本迁移3类应用策略,基于谷歌地球引擎(Google Earth Engine,GEE)获取Sentinel-2影像,采用支持向量机(support vector machine,SVM)和随机森林(random forest,RF)2种机器学习方法提取灌区种植结构,系统分析样本像素密度对分类精度的影响,并评估跨灌区样本迁移的适用性。结果表明:分类精度随样本像素密度的变化符合Logistic模型,均达到极显著水平(P<0.01),即分类精度先快速提升,样本密度增加到一定阈值后分类精度趋于稳定,单灌区样本本地应用时像素密度阈值为0.030像素/hm2,跨区迁移时提升至0.045像素/hm2以补偿特征异质性;双灌区样本迁移分类精度提升明显,总体精度较单灌区样本迁移最高提升17.67个百分点;RF在单灌区样本本地应用时表现更优,而SVM在跨区域样本迁移时表现出更强的稳定性和适应能力。研究为作物分类的样本优化与机器学习模型的跨区域迁移应用提供了理论依据与实践参考。

       

      Abstract: Sample quantity and quality were the critical factors on the accuracy of the crop classification using remote sensing. The appropriate sample quantity and its cross-regional transfer are related to the field sampling workload. To explore the minimum of training samples and the effectiveness of sample transfer, this study took the irrigation areas of Ruoqiang, Washixia, and Tashisayi in Xinjiang as study areas, proposed the concept of sample pixel density (the proportion of sample pixels to all pixels in the classification area), constructed three application strategies including single irrigation area sample local application, single irrigation area sample transfer, and double irrigation areas sample transfer, obtained Sentinel-2 images based on Google Earth Engine (GEE), employed two machine learning methods—support vector machine (SVM) and random forest (RF)—to extract the planting structure of irrigation areas, systematically analyzed the impact of sample pixel density on classification accuracy, and evaluated the applicability of cross-irrigation area sample transfer. The results showed that: 1) With increasing pixel density of the training samples, the classification accuracy exhibited a rapid initial rise followed by a subsequent plateau. This pattern of accuracy evolution conformed to a logistic curve, and the fitting result was statistically significant (P<0.01). The local application threshold for irrigation area samples was 0.030 pixels/hm²; to compensate for the classification error caused by the heterogeneity of the feature space, cross-regional sample transfer increases the sample pixel density to 0.045 pixels/hm². 2) In sample transfer from a single irrigation area, classification accuracy was affected by geographical proximity: adjacent irrigation areas achieved higher classification accuracy due to their similarity in surface feature spectral characteristics to the target irrigation area, while the classification accuracy of distant irrigation areas decreased significantly due to regional heterogeneity. In contrast, sample transfer from double irrigation areas enhanced the model’s adaptability to heterogeneous environments by fusing multi-regional spectral characteristics, resulting in a significant improvement in classification accuracy. The overall accuracy improved by up to 17.67 percentage points compared with single-irrigation-area transfer. 3) In terms of classification algorithms, RF performed better when applied locally to single irrigation area samples, with an overall accuracy improvement of up to 2.82 percentage points compared to SVM; SVM is more stable in cross regional sample migration. When transferring samples from a single irrigation area, SVM showed an overall accuracy improvement of up to 14.78, 4.30, and 0.84 percentage points compared to RF in the Ruoqiang, Washixia and Tashisayi irrigation area, respectively. When transferring samples from double irrigation areas, SVM also showed a certain degree of improvement in overall accuracy, up to 3.01 percentage points. This finding can provide a theoretical basis and practical reference for the sample optimization strategies in regional crop classification and the cross-regional transfer application of machine learning models.

       

    /

    返回文章
    返回