Yu Guo, Liu Qiubin, Chen Fangyuan, Liu Dazhao. Semantic segmentation method for rubber satellite images based on improved residual networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(15): 204-211. DOI: 10.11975/j.issn.1002-6819.2022.15.022
    Citation: Yu Guo, Liu Qiubin, Chen Fangyuan, Liu Dazhao. Semantic segmentation method for rubber satellite images based on improved residual networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(15): 204-211. DOI: 10.11975/j.issn.1002-6819.2022.15.022

    Semantic segmentation method for rubber satellite images based on improved residual networks

    • Rubber has been one of the most important cash crops in recent years. It is of great practical significance to segment the satellite images of rubber plantations using deep learning for agricultural refinement. In this study, a novel strategy was proposed to improve the residual network and its variant (ResNet-ve) for the segmentation. The study area was taken as the Rubber Plantation in Xuwen County, Zhanjiang City, Guangdong Province of China. The dataset was constructed using the Sentinel-2 multispectral satellite images as the data source. The OCRNet was used to incorporate an improved residual network. Inspired by Deeper Bottleneck Architectures proposed by Kaiming He, the modification strategy was established to modify path B in the Down Sampling module of each stage in the ResNet_vd middle layer. Specifically, the mean pooling module with 2×2 steps of 1 was replaced with a most-valued pooling module with 2×2 steps of 1, and then to add a 1×1 convolution before (called Deeper Bottleneck Pooling Architectures-like). The same modification strategy was applied to the other residual modules of the same stage, after which these modules were sequentially cascaded to form the improved stage. After that, the activation function was modified into the PReLU function to compare the network performance of the backbone network using the improved ResNet_ve. The improved residual network ResNet50_ve and basic ResNet50_vd network were used as the backbone networks of the four models. Among them, the student model was obtained to distillate the ResNet50_vd on ImageNet1k classification dataset using migration learning. A pre-trained model was then injected into the network training weight parameter for the modified ResNet_ve backbone network and ResNet_vd baseline backbone network to start the four networks. The results show that the ResNet50_vd network with the medium number of layers converged better than the ResNe101_vd network with the deeper layers on the training set of small-scale satellite images, and the OCRNet network on ResNet50_vd outperformed the DeeplabV3, DeeplabV3+, and PSPNet networks in all aspects. The OCRNet network with ResNet50_vd was used as a baseline for the subsequent experiments. The OCRNet with ResNet50_ve as the backbone network was achieved in the mIoU of 0.85, pixel accuracy of 97.87%, and a Kappa coefficient of 0.90 on the validation set. Therefore, an OCRNet with ResNet50_ve as the backbone network presented the best fineness of the internal boundary of the prediction graph among the four networks. There were also the least amount of time resources and the least number of parameters among the four networks. The OCRNet with the ResNet_ve as the backbone network was increased by 0.01 in the mIoU, and 0.01 in the Kappa coefficient, compared with the OCRNet with the ResNet_vd as the backbone network. By contrast, the accuracy metrics of the other three networks cannot be improved much using the ResNet_ve as the backbone network. The other three networks only improved the index data, in terms of the Kappa coefficient and mIoU index. Among them, the most obvious improvement was achieved in the DeepLabV3p. The OCRNet model with the improved residual network used the contextual and the deepest pixel features for the weighted splicing without the contextual information loss, while explicitly enhancing the pixel contributions from the same class of objects. As such, the background noise cannot be introduced, when extracting the multi-scale information. Thus, better performance was achieved in the accurate extraction of rubber distribution.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return