融合多源特征与注意力机制的改进U-Net鱼鳞坑遥感提取方法

    Extracting fish-scale pits from remote sensing images via an improved U-Net integrating multi-source features and attention mechanisms

    • 摘要: 鱼鳞坑是黄土高原典型的小型水土保持措施,由于其尺度小、分布不均,传统卫星遥感方法难以实现高精度识别。为此,该研究提出一种融合多源特征与注意力机制的深度学习鱼鳞坑遥感提取方法,构建了“特征重要性分析 + 注意力增强U-Net结构设计”的技术框架。基于无人机获取的高分辨率多光谱影像与数字高程模型(digital elevation model,DEM),该研究综合运用Spearman相关系数与SHAP(shapley additive explanations)可解释性分析方法,对光谱与地形特征进行重要性评估与冗余剔除,最终优选出4类关键特征,并据此设计了9种特征组合方案。在此基础上,采用U-Net、DeepLabV3+、SegNet与FCN四种语义分割模型开展对比试验,结果表明以RGB + Slope的特征组合方案在U-Net模型中识别效果最优。在模型结构方面,该研究以U-Net为基础,融合金字塔压缩注意力模块(pyramid squeeze attention module,PSAM)与多级特征注意力上采样模块(multi-scale feature attention upsampling module,MFAU),增强模型对鱼鳞坑边缘与空间结构的感知能力,并设计消融试验验证改进效果。试验结果表明,在最优特征组合的数据输入下,改进模型在测试区交并比提升2.47个百分点,F1分数提升1.34个百分点,召回率提升2.72个百分点,精确率提升1.02个百分点,表现出良好的提取精度与区域泛化能力。研究表明,特征重要性分析与注意力增强结构设计的融合策略可有效提升模型对小尺度地貌目标的识别性能,为鱼鳞坑等微地形构筑物的高精度遥感提取提供技术支撑,也为多源信息融合与深度学习模型构建提供了理论参考。

       

      Abstract: Fish-scale pits can represent a typical small-scale engineering for soil and water conservation on the Loess Plateau. These structures can greatly contribute to the significant effectiveness in water retention and sediment prevention. Semicircular or elliptical shapes can be constructed along contour lines with the artificial regularity and strong geomorphological adaptation. However, their identification is still remains challenging using conventional satellite imagery, due to their small spatial scale, morphological variation, and dynamic boundary over time. In this research, a robust identification was developed to extract the microscale topographic features of the fish-scale pits from remote sensing images. The specific limitations were also avoided for the high precision. A deep learning framework was then proposed to integrate the multi-source features and attention mechanism. A "feature importance analysis + attention-enhanced U-Net" architecture was constructed after optimization. The high-resolution multispectral imagery was acquired for the Digital Elevation Model (DEM) from Unmanned Aerial Vehicle (UAV) surveys. A feature dataset was initially constructed, including the spectral characteristics—specifically the Red, Green, and Blue (RGB) bands, Near-Infrared (NIR), and Red Edge—and topographic derivatives—namely the DEM, Slope, Aspect, Curvature, and Relief. Spearman's rank correlation coefficient and Shapley Additive explanations (SHAP) were combined to quantitatively assess the feature importance. The hybrid analytical approach was also employed to identify the redundancies. The systematic evaluation was performed on the selection of four optimal features: RGB, NIR, DEM, and Slope. Subsequently, nine combinations were designed from dual- to full-feature sets. A comparative analysis was then conducted to evaluate the performance of these combinations using four semantic segmentation models: U-Net, DeepLabV3+, SegNet, and the Fully Convolutional Network (FCN). Concurrently, the U-Net architecture was enhanced substantially. A Pyramid Squeeze Attention Module (PSAM) was integrated into the encoder path. Multi-scale convolutional layers were coupled with the dual attention mechanisms. Both channel and spatial domains were selected for the subtle features of the fish-scale pits. A Multi-scale Feature Attention Upsampling module (MFAU) was incorporated into the decoder. The cross-layer feature fusion and gated attention were utilized to significantly improve the reconstruction fidelity of the complex and faint boundaries during upsampling. Ablation tests were implemented to determine the contribution rates of each architectural modification. The experimental results demonstrated that the feature combination of the RGB and Slope achieved the superior performance within the U-Net model framework, with a peak Intersection over Union (IoU) of 90.79%. This combination consistently outperformed all other feature sets. The more effective performance was obtained within the U-Net than the rest three benchmark models. The fully enhanced model—both the PSAM and MFAU modules were incorporated to utilize the optimal RGB-Slope input—achieved a final IoU of 93.26% on the independent test dataset. There was a significant improvement of 2.47 percentage points over the baseline U-Net. Correspondingly, the F1-score increased by 1.34%, recall by 2.72%, and precision by 1.02%. Crucially, the remarkable robustness and stable performance were also observed under various topographic conditions, including different slope gradients and aspects. The efficacy of the PSAM and MFAU modules was validated by ablation tests. Notably, the PSAM contributed to the boundary integrity, while the MFAU contributed to the accurate reconstruction of the fine spatial details. The high-resolution UAV imagery provided the necessary granularity to support the deep learning. A high level of some targets was attained, rather than the conventional satellite platforms. The feature importance analysis was synergistically combined with an attention architecture and deep learning. The technical framework was developed to accurately identify the small-scale geomorphological targets. A reliable and effective solution was provided for the intelligent monitoring and assessment of soil and water conservation engineering, such as the fish-scale pits. Furthermore, there were some advancements in the feature selection and network design for the minute and complex features. The finding can offer a valuable conceptual and practical reference in the micro-topography identification from the high-resolution remote sensing imagery and the advanced deep learning models in geospatial applications.

       

    /

    返回文章
    返回