Abstract:
The ecological function of cultivated land is ever declining globally. Non-cultivated habitat is the key factor to enhance the cultivated land and ecosystem services. It is very important to accurately identify its types and boundaries. However, it is still lacking in high-quality labelled datasets for the habitat types in complex agricultural landscapes. Conventional remote sensing is often required to accurately extract the micro-scale terrain transition zone. Remote sensing data resolution has been limited to low classification accuracy. Existing deep learning models can improve the performance of the semantic segmentation. Some problems still remain, such as the incomplete parcel boundaries, fuzzy edges, and slender feature fractures. The edge perception module is applied to the extraction of the cultivated land blocks and ridges, which significantly improved the recognition accuracy. In this study, a high-resolution image dataset of remote sensing was constructed for the habitat of the cultivated land. The edge perception, DeeplabV3+ model, was created to realize accurate recognition of the habitat types of the cultivated land system at a low computational cost. Firstly, the Pegasus V500 vertical takeoff and landing fixed-wing UAV was used to collect the ultra-high resolution remote sensing images. Rasterio was used to cut them after UAV assembly. A combination of the SAM-Labelme automatic labeling and manual correction was utilized to generate the VOC dataset labels from the trimmed images in the study area. The label classification was referred to the classification of non-cultivated habitats by the European QuESSA project. 15 types of habitat types were constructed for the cultivated land, according to the cultivated land and no cultivated habitat in Hailun, one of the key areas of the black soil in Northeast China; The edge perception, DeeplabV3+ network, was constructed and then optimized, And then the VOC dataset tag was used to train the model, and finally the model was obtained to identify the habitat types of cultivated land. The improved network adopted the DeepLabV3+ network as the benchmark model. A hierarchical deformable convolution was employed in the encoder, where high accuracy was obtained with the reduction of 88.85% of training parameters. The decoder was also integrated with the multi-scale features and dual-modal edge perception, in order to achieve the fusion of the semantic features. A channel attention mechanism was added to the low-level features in order to enhance the key information and suppress noise. A Mixed Loss function and Layering Differential Learning rate were gradually integrated after optimization. The experimental results show that the edge perception DeepLabV3+ model, included a VOC dataset of 15 types of farmland habitats. The proposed model achieves an average Intersection over Union (IoU) of 66.55% and an accuracy of 80.31%, which represent respective improvements of 9.74% and 4.05% in comparison to the baseline network (DeepLabV3+). Ablation experiments verified that the explicit and implicit modalities of the edge perception module enhanced the IoU of micro-linear habitats, such as the field ridges and production roads by 6.99%-36.56%. The visualization data indicate that the edge perception, DeepLabV3+ model increased the minimum effective resolution unit from 10-30 m to 1-3 m. Compared with the baseline model, the improved model required only a 5.5% increase in training time, yet its mean Intersection over Union (IoU) was improved by 9.74%. This finding can provide an edge in perception semantic segmentation for habitat identification in farmland. Meter-level precision was also achieved in the habitat identification at a lower cost. The finding can also provide a technical basis to interpret the habitats of the micro-scale farmland.