基于SE-VUNet模型的高分辨率遥感影像耕地提取

    Extracting cultivated land from high-resolution remote sensing images using SE-VUNet modeling

    • 摘要: 为应对多样地形下耕地分割中边界模糊、细节缺失等问题,提出一种改进的U-Net耕地提取方法。此方法融合VGG网络加深主干特征提取(V-UNet),嵌入Squeeze-and-Excitation(SE)注意力机制优化特征定位与边缘细节,利用Batchnormalization(BN)层抑制过拟合;并通过在V-UNet网络5个关键位置嵌入SE模块形成5种SE-VUNet模型;基于GID高分二号RGB数据,在平整集中与复杂冗余两种耕地地形下,与PSPNet、HrNet、Deeplabv3+、U-Net进行对比试验。结果表明,两种地形下,5种SE-VUNet均优于对比网络;SE模块置于下采样之前的SE-VUNet对平整集中耕地分割最优,平均交并比(mIoU)为96.66%,F1分数(F1-score)为97.57%;SE模块置于特征学习部分的SE-VUNet对复杂冗余耕地分割效果最佳(mIoU=94.40%,F1-score=97.11%)。此模型可为应对多样地形下,耕地分割中边界模糊、细节缺失等问题提供技术参考。

       

      Abstract: Accurately extracting cultivated land from the high-resolution remote sensing (HRRS) imagery is critical for the national food security, agricultural planning, and ecological management. However, existing deep learning can struggle with the boundary ambiguity, information loss, and adaptability over the diverse terrains, particularly in areas with fragmented parcels, spectral heterogeneity (e.g., varying crop types, soil moisture, and phenology), and complex mixtures with spectrally similar non-cropland covers. In order to overcome these limitations, this study aims to develop a terrain-adaptive segmentation model for the robust extraction of the cultivated land. SE-VUNet, an enhanced U-Net architecture, was proposed to integrate three key innovations: 1) VGG-Enhanced Encoder. The standard encoder was replaced with a VGG-based deep feature extractor, in order to capture the richer multi-scale contextual information, thereby improving representation of the local textures (e.g., field ridges and ditches) and global patterns (e.g., plain vs. terrace distributions). 2) Terrain-Adaptive Squeeze-and-Excitation (SE) Attention. SE modules were strategically embedded to dynamically recalibrate the channel-wise feature importance, in order to enhance the vegetation-relevant channels while suppressing noise. Five variants (SE-VUNet(1) to (5)) were created to embed the SE modules at: Shallow Feature Layer (1), Pre-Downsampling (2), Skip-Connection (3), Decoder Fusion (4), and Feature Learning Module (5). 3) Batch Normalization (BN) Optimization. BN layers were integrated to mitigate the internal covariate shift after each convolutional block. The convergence was accelerated to avoid overfitting (crucial given the limited labeled data) for the model generalization. A series of experiments was carried out to utilize the Gaofen Image Dataset (GID) that was derived from Gaofen-2 (GF-2) satellite RGB imagery. Two terrain types were evaluated: (i) Flat Homogeneous Land (large, contiguous fields, uniform spectra, and low interference) and (ii) Complex Heterogeneous Land (small, irregular fields, blurred boundaries, high spectral variability, and significant non-cropland mixing). SE-VUNet variants were benchmarked against the PSPNet, HrNet, Deeplabv3+, and baseline U-Net. All five SE-VUNet variants outperformed the baselines over both terrains. The better performance was validated by the VGG feature extraction and SE attention integration. Crucially, the optimal SE placement was terrain-dependent: Flat Homogeneous Terrain. The SE-VUNet(2) (SE Pre-Downsampling) achieved the superior performance with a mean intersection over union (mIoU) of 96.66% and an F1-score of 97.57%. The high-resolution shallow features were early amplified to preserve the critical fine linear details, like field boundaries and irrigation canals. SE-VUNet(5) (SE in Feature Learning Module) delivered the optimal performance, thus achieving an mIoU of 94.40% and an F1-score of 97.11%. The adaptive multi-scale feature fusion and deep feature refinement were significantly improved to identify the spectrally ambiguous classes (e.g., crops vs. grasslands) for the high resolution of the fragmented boundaries. Quantitative analysis confirmed that the SE-VUNet significantly reduced the errors of the boundary localization. Small-field details were also captured, compared with all baselines. The explicit optimization strategy of the terrain-based module was proven highly effective. The terrain-aware model was customized for high-precision remote sensing in modern agriculture. The SE-VUNet can provide a robust framework to synergistically combine the deep VGG feature extraction, channel-wise SE attention recalibration, and BN-stabilized training. The findings highlight that the strategic optimization of the attention mechanism can be expected to overcome the boundary blurring and detail loss, according to the landscape heterogeneity. The terrain-adaptive architecture significantly enhanced the mapping accuracy of the cultivated land under diverse topographic conditions. The framework can also extend the multi-temporal and multi-spectral data in the future, in order to further boost the dynamic agricultural monitoring and precision farming.

       

    /

    返回文章
    返回