WEI Jingzhi, HUANG Xiaoli, JIANG Ling, et al. Extracting fish-scale pits from remote sensing images via an improved U-Net integrating multi-source features and attention mechanismsJ. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2026, 42(2): 506-516. DOI: 10.11975/j.issn.1002-6819.202507137
    Citation: WEI Jingzhi, HUANG Xiaoli, JIANG Ling, et al. Extracting fish-scale pits from remote sensing images via an improved U-Net integrating multi-source features and attention mechanismsJ. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2026, 42(2): 506-516. DOI: 10.11975/j.issn.1002-6819.202507137

    Extracting fish-scale pits from remote sensing images via an improved U-Net integrating multi-source features and attention mechanisms

    • Fish-scale pits can represent a typical small-scale engineering for soil and water conservation on the Loess Plateau. These structures can greatly contribute to the significant effectiveness in water retention and sediment prevention. Semicircular or elliptical shapes can be constructed along contour lines with the artificial regularity and strong geomorphological adaptation. However, their identification is still remains challenging using conventional satellite imagery, due to their small spatial scale, morphological variation, and dynamic boundary over time. In this research, a robust identification was developed to extract the microscale topographic features of the fish-scale pits from remote sensing images. The specific limitations were also avoided for the high precision. A deep learning framework was then proposed to integrate the multi-source features and attention mechanism. A "feature importance analysis + attention-enhanced U-Net" architecture was constructed after optimization. The high-resolution multispectral imagery was acquired for the Digital Elevation Model (DEM) from Unmanned Aerial Vehicle (UAV) surveys. A feature dataset was initially constructed, including the spectral characteristics—specifically the Red, Green, and Blue (RGB) bands, Near-Infrared (NIR), and Red Edge—and topographic derivatives—namely the DEM, Slope, Aspect, Curvature, and Relief. Spearman's rank correlation coefficient and Shapley Additive explanations (SHAP) were combined to quantitatively assess the feature importance. The hybrid analytical approach was also employed to identify the redundancies. The systematic evaluation was performed on the selection of four optimal features: RGB, NIR, DEM, and Slope. Subsequently, nine combinations were designed from dual- to full-feature sets. A comparative analysis was then conducted to evaluate the performance of these combinations using four semantic segmentation models: U-Net, DeepLabV3+, SegNet, and the Fully Convolutional Network (FCN). Concurrently, the U-Net architecture was enhanced substantially. A Pyramid Squeeze Attention Module (PSAM) was integrated into the encoder path. Multi-scale convolutional layers were coupled with the dual attention mechanisms. Both channel and spatial domains were selected for the subtle features of the fish-scale pits. A Multi-scale Feature Attention Upsampling module (MFAU) was incorporated into the decoder. The cross-layer feature fusion and gated attention were utilized to significantly improve the reconstruction fidelity of the complex and faint boundaries during upsampling. Ablation tests were implemented to determine the contribution rates of each architectural modification. The experimental results demonstrated that the feature combination of the RGB and Slope achieved the superior performance within the U-Net model framework, with a peak Intersection over Union (IoU) of 90.79%. This combination consistently outperformed all other feature sets. The more effective performance was obtained within the U-Net than the rest three benchmark models. The fully enhanced model—both the PSAM and MFAU modules were incorporated to utilize the optimal RGB-Slope input—achieved a final IoU of 93.26% on the independent test dataset. There was a significant improvement of 2.47 percentage points over the baseline U-Net. Correspondingly, the F1-score increased by 1.34%, recall by 2.72%, and precision by 1.02%. Crucially, the remarkable robustness and stable performance were also observed under various topographic conditions, including different slope gradients and aspects. The efficacy of the PSAM and MFAU modules was validated by ablation tests. Notably, the PSAM contributed to the boundary integrity, while the MFAU contributed to the accurate reconstruction of the fine spatial details. The high-resolution UAV imagery provided the necessary granularity to support the deep learning. A high level of some targets was attained, rather than the conventional satellite platforms. The feature importance analysis was synergistically combined with an attention architecture and deep learning. The technical framework was developed to accurately identify the small-scale geomorphological targets. A reliable and effective solution was provided for the intelligent monitoring and assessment of soil and water conservation engineering, such as the fish-scale pits. Furthermore, there were some advancements in the feature selection and network design for the minute and complex features. The finding can offer a valuable conceptual and practical reference in the micro-topography identification from the high-resolution remote sensing imagery and the advanced deep learning models in geospatial applications.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return