WEI Jingzhi, HUANG Xiaoli, JIANG Ling, et al. Extracting fish-scale pits from remote sensing images via an improved U-Net integrating multi-source features and attention mechanisms[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), xxxx, x(x): 1-11. DOI: 10.11975/j.issn.1002-6819.202507137
    Citation: WEI Jingzhi, HUANG Xiaoli, JIANG Ling, et al. Extracting fish-scale pits from remote sensing images via an improved U-Net integrating multi-source features and attention mechanisms[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), xxxx, x(x): 1-11. DOI: 10.11975/j.issn.1002-6819.202507137

    Extracting fish-scale pits from remote sensing images via an improved U-Net integrating multi-source features and attention mechanisms

    • Fish-scale pits represent a typical small-scale soil and water conservation engineering measure on the Loess Plateau, demonstrating significant effectiveness in water retention and sediment control. These structures typically exhibit semicircular or elliptical shapes, constructed along contour lines with distinct artificial regularity and strong geomorphological adaptation. However, their automated identification using conventional satellite imagery remains challenging due to their small spatial scale, morphological variability, and dynamic boundary characteristics over time. This research aimed to develop a robust identification methodology to address these specific limitations and achieve high-precision extraction of these critical micro-topographic features. A novel deep learning framework integrating multi-source feature optimization and attention mechanisms was proposed, structured around a "feature importance analysis + attention-enhanced U-Net" architecture. The methodology commenced with acquiring high-resolution multispectral imagery and a digital elevation model (DEM) derived from unmanned aerial vehicle (UAV) surveys. An initial feature set encompassing spectral characteristics—specifically the red, green, and blue (RGB) bands, near-infrared (NIR), and red edge—and topographic derivatives—namely the DEM, slope, aspect, curvature, and relief—was constructed. A hybrid analytical approach combining Spearman's rank correlation coefficient and shapley Additive explanations (SHAP) was systematically employed to quantitatively assess feature importance and identify redundancies. This rigorous evaluation culminated in the selection of four optimal features: RGB, NIR, DEM, and Slope. Subsequently, nine distinct feature combination schemes, ranging from dual-feature to full-feature sets, were designed for comprehensive experimentation. A comparative analysis was then conducted utilizing four prominent semantic segmentation models: U-Net, DeepLabV3+, SegNet, and the fully convolutional network (FCN), to evaluate performance across these combinations. Concurrently, the U-Net architecture was substantially enhanced. A pyramid squeeze attention module (PSAM) was integrated into the encoder path. This module employed multi-scale convolutional layers coupled with dual attention mechanisms operating in both channel and spatial domains to amplify the discriminative capability for subtle features characteristic of fish-scale pits. Complementing this, a multi-scale feature attention upsampling module (MFAU) was incorporated into the decoder. The MFAU leveraged cross-layer feature fusion and a gated attention strategy during upsampling to significantly improve the reconstruction fidelity of complex and faint boundaries. Ablation studies were meticulously designed and implemented to isolate and quantify the contribution of each architectural modification. The experimental results demonstrated conclusively that the feature combination of RGB and Slope yielded superior performance within the U-Net model framework, achieving a peak intersection over union (IoU) of 90.79%. This combination consistently outperformed all other feature sets and proved more effective within U-Net than the other three benchmark models tested. The proposed, fully enhanced model-incorporating both the PSAM and MFAU modules and utilizing the optimal RGB-Slope input—achieved a final IoU of 93.26% on the independent test dataset. This represented a significant improvement of 2.47 percentage points over the baseline U-Net. Corresponding enhancements were observed across other key metrics: the F1-score increased by 1.34%, Recall by 2.72%, and Precision by 1.02%. Crucially, the model exhibited remarkable robustness and stable performance across varied topographic conditions, including different slope gradients and aspects. The ablation studies provided clear evidence validating the efficacy of both the PSAM and MFAU modules, with the PSAM contributing notably to boundary integrity and the MFAU to the accurate reconstruction of fine spatial details. The high-resolution UAV imagery proved instrumental, providing the necessary granular detail to support the model's learning process, a level of detail typically unattainable from conventional satellite platforms for targets of this scale. This study successfully validated the synergistic potential of combining rigorous feature importance analysis with an attention-enhanced deep learning architecture for accurately identifying small-scale geomorphological targets. The proposed technical framework provides a reliable and effective solution for the intelligent monitoring and assessment of soil and water conservation engineering structures like fish-scale pits. Furthermore, the methodological advancements presented, particularly in feature selection and network design for handling minute and complex features, offer a valuable conceptual and practical reference for future research in micro-topography identification from high-resolution remote sensing imagery and for the broader development of advanced deep learning models in geospatial applications.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return