Abstract:
Accurately extracting cultivated land from the high-resolution remote sensing (HRRS) imagery is critical for the national food security, agricultural planning, and ecological management. However, existing deep learning can struggle with the boundary ambiguity, information loss, and adaptability over the diverse terrains, particularly in areas with fragmented parcels, spectral heterogeneity (e.g., varying crop types, soil moisture, and phenology), and complex mixtures with spectrally similar non-cropland covers. In order to overcome these limitations, this study aims to develop a terrain-adaptive segmentation model for the robust extraction of the cultivated land. SE-VUNet, an enhanced U-Net architecture, was proposed to integrate three key innovations: 1) VGG-Enhanced Encoder. The standard encoder was replaced with a VGG-based deep feature extractor, in order to capture the richer multi-scale contextual information, thereby improving representation of the local textures (e.g., field ridges and ditches) and global patterns (e.g., plain vs. terrace distributions). 2) Terrain-Adaptive Squeeze-and-Excitation (SE) Attention. SE modules were strategically embedded to dynamically recalibrate the channel-wise feature importance, in order to enhance the vegetation-relevant channels while suppressing noise. Five variants (SE-VUNet(1) to (5)) were created to embed the SE modules at: Shallow Feature Layer (1), Pre-Downsampling (2), Skip-Connection (3), Decoder Fusion (4), and Feature Learning Module (5). 3) Batch Normalization (BN) Optimization. BN layers were integrated to mitigate the internal covariate shift after each convolutional block. The convergence was accelerated to avoid overfitting (crucial given the limited labeled data) for the model generalization. A series of experiments was carried out to utilize the Gaofen Image Dataset (GID) that was derived from Gaofen-2 (GF-2) satellite RGB imagery. Two terrain types were evaluated: (i) Flat Homogeneous Land (large, contiguous fields, uniform spectra, and low interference) and (ii) Complex Heterogeneous Land (small, irregular fields, blurred boundaries, high spectral variability, and significant non-cropland mixing). SE-VUNet variants were benchmarked against the PSPNet, HrNet, Deeplabv3+, and baseline U-Net. All five SE-VUNet variants outperformed the baselines over both terrains. The better performance was validated by the VGG feature extraction and SE attention integration. Crucially, the optimal SE placement was terrain-dependent: Flat Homogeneous Terrain. The SE-VUNet(2) (SE Pre-Downsampling) achieved the superior performance with a mean intersection over union (mIoU) of 96.66% and an
F1-score of 97.57%. The high-resolution shallow features were early amplified to preserve the critical fine linear details, like field boundaries and irrigation canals. SE-VUNet(5) (SE in Feature Learning Module) delivered the optimal performance, thus achieving an mIoU of 94.40% and an
F1-score of 97.11%. The adaptive multi-scale feature fusion and deep feature refinement were significantly improved to identify the spectrally ambiguous classes (e.g., crops vs. grasslands) for the high resolution of the fragmented boundaries. Quantitative analysis confirmed that the SE-VUNet significantly reduced the errors of the boundary localization. Small-field details were also captured, compared with all baselines. The explicit optimization strategy of the terrain-based module was proven highly effective. The terrain-aware model was customized for high-precision remote sensing in modern agriculture. The SE-VUNet can provide a robust framework to synergistically combine the deep VGG feature extraction, channel-wise SE attention recalibration, and BN-stabilized training. The findings highlight that the strategic optimization of the attention mechanism can be expected to overcome the boundary blurring and detail loss, according to the landscape heterogeneity. The terrain-adaptive architecture significantly enhanced the mapping accuracy of the cultivated land under diverse topographic conditions. The framework can also extend the multi-temporal and multi-spectral data in the future, in order to further boost the dynamic agricultural monitoring and precision farming.