Extracting apple planting area from GF-2 satellite imagery using an improved UNet++
-
Graphical Abstract
-
Abstract
Accurate extraction of apple planting areas from the high-resolution remote sensing images is often required to optimize the production and industrial layout. This study aims to solve the technical problems with the extraction of apple planting areas from the high-resolution remote sensing images, including spatial fragmentation, spectral confusion, and boundary blurring. To this end, an enhanced MSDAW-UNet++ model was proposed using the UNet++ architecture. A multi-scale dual attention (MSDA) module was incorporated at the key feature fusion nodes, in order to enhance the multi-scale contextual and spatial information of apple planting areas; Meanwhile, a wavelet nested fusion (WNFB) module was embedded at the first feature fusion node of each layer in UNet++. A systematic preprocessing was performed on the GF-2 satellite images during data preparation. A dataset was then constructed to extract the apple planting areas. The MSDA module was then integrated with the multi-scale feature extraction, the multi-head self-attention (MHSA), and positional attention (PSA) mechanisms. Four scales of feature representation were firstly obtained using depthwise separable convolutions. Then, this multi-scale information was input into the MHSA and PSA, respectively. The MHSA mechanism was used to construct the long-range dependencies between apple planting areas in the different regions and combined local and global information by the correlations among input sequence elements. After that, the overall structure of the apple orchard was effectively analyzed after calculation. The conventional feed-forward neural network (FFN) was replaced with an enhanced E-FFN. More efficient feature interaction and multi-scale learning were achieved at the lower computational cost. Furthermore, the local perception of a convolutional neural network (CNN) was integrated with the global modelling strength of transformers, in order to enhance the accuracy and efficiency of apple plantation extraction. The PSA was generated the location-aware attention maps using feature interaction, and then explicitly modeled the geometric constraints among pixels. Continuous energy responses were obtained in the edge region. Spatial continuity was captured to reinforce the correlation between long-distance pixels for the smooth transitions between adjacent pixels. The local features were preserved to prevent the spatial disconnection that caused by environmental complexity. Ultimately, the boundary consistency and regional integrity were improved after semantic segmentation. Finally, the PSA and MHSA mechanisms were combined to produce the output features with the multi-scale contextual and spatial information. The apple planting area was extracted in a complex planting environment. In addition to the MSDA module, the wavelet nested fusion (WNFB) module was specifically designed to combine the wavelet transform convolution (WTConv). The conventional convolution kernels were replaced with the wavelet transform convolution to optimize the semantic segmentation using frequency-space domain synergistic feature extraction. The better performance was obtained to differentiate between spectrally similar features. Experimental test showed that the MSDAW-UNet++ model performed best to extract the apple planting areas, with an F1-score of 96.63% and an IoUz of 90.46%. Compared with the UNet++ benchmark model, the improved model was achieved in the absolute improvements of 3.87 percentage points in the F1-score and 10.07 percentage points in the IoU value, respectively. Compared with classic semantic segmentation models (FCN, UNet and DeepLab v3+), current mainstream remote sensing semantic segmentation models (MCSNet, CMLFormer, and CMTFNet), and UNet derivative models (MAResU-Net, CM-UNet, and UNet3+), the F1-score was improved by 2.35-9.55 percentage points and the IoU by 6.67-19.54 percentage points. Ablation experiments were used to analyze the effectiveness of the multi-scale dual-attention and wavelet nested fusion modules. The MSDA and WNFB modules were effectively extracted the multi-scale contextual and spatial information, as well as frequency-domain features of the apple planting area. A more comprehensive feature expression can be provided for the fine extraction of the apple planting area in a complex planting environment. The findings can offer a valuable reference for the fine extraction from the orchard images using high-resolution remote sensing.
-
-