Abstract:
Sample quantity and quality were the critical factors on the accuracy of the crop classification using remote sensing. The appropriate sample quantity and its cross-regional transfer are related to the field sampling workload. To explore the minimum of training samples and the effectiveness of sample transfer, this study took the irrigation areas of Ruoqiang, Washixia, and Tashisayi in Xinjiang as study areas, proposed the concept of sample pixel density (the proportion of sample pixels to all pixels in the classification area), constructed three application strategies including single irrigation area sample local application, single irrigation area sample transfer, and double irrigation areas sample transfer, obtained Sentinel-2 images based on Google Earth Engine (GEE), employed two machine learning methods—support vector machine (SVM) and random forest (RF)—to extract the planting structure of irrigation areas, systematically analyzed the impact of sample pixel density on classification accuracy, and evaluated the applicability of cross-irrigation area sample transfer. The results showed that: 1) With increasing pixel density of the training samples, the classification accuracy exhibited a rapid initial rise followed by a subsequent plateau. This pattern of accuracy evolution conformed to a logistic curve, and the fitting result was statistically significant (
P<0.01). The local application threshold for irrigation area samples was 0.030 pixels/hm²; to compensate for the classification error caused by the heterogeneity of the feature space, cross-regional sample transfer increases the sample pixel density to 0.045 pixels/hm². 2) In sample transfer from a single irrigation area, classification accuracy was affected by geographical proximity: adjacent irrigation areas achieved higher classification accuracy due to their similarity in surface feature spectral characteristics to the target irrigation area, while the classification accuracy of distant irrigation areas decreased significantly due to regional heterogeneity. In contrast, sample transfer from double irrigation areas enhanced the model’s adaptability to heterogeneous environments by fusing multi-regional spectral characteristics, resulting in a significant improvement in classification accuracy. The overall accuracy improved by up to 17.67 percentage points compared with single-irrigation-area transfer. 3) In terms of classification algorithms, RF performed better when applied locally to single irrigation area samples, with an overall accuracy improvement of up to 2.82 percentage points compared to SVM; SVM is more stable in cross regional sample migration. When transferring samples from a single irrigation area, SVM showed an overall accuracy improvement of up to 14.78, 4.30, and 0.84 percentage points compared to RF in the Ruoqiang, Washixia and Tashisayi irrigation area, respectively. When transferring samples from double irrigation areas, SVM also showed a certain degree of improvement in overall accuracy, up to 3.01 percentage points. This finding can provide a theoretical basis and practical reference for the sample optimization strategies in regional crop classification and the cross-regional transfer application of machine learning models.