余果, 刘秋斌, 陈方园, 刘大召. 基于改进残差网络的橡胶林卫星影像语义分割方法[J]. 农业工程学报, 2022, 38(15): 204-211. DOI: 10.11975/j.issn.1002-6819.2022.15.022
    引用本文: 余果, 刘秋斌, 陈方园, 刘大召. 基于改进残差网络的橡胶林卫星影像语义分割方法[J]. 农业工程学报, 2022, 38(15): 204-211. DOI: 10.11975/j.issn.1002-6819.2022.15.022
    Yu Guo, Liu Qiubin, Chen Fangyuan, Liu Dazhao. Semantic segmentation method for rubber satellite images based on improved residual networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(15): 204-211. DOI: 10.11975/j.issn.1002-6819.2022.15.022
    Citation: Yu Guo, Liu Qiubin, Chen Fangyuan, Liu Dazhao. Semantic segmentation method for rubber satellite images based on improved residual networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(15): 204-211. DOI: 10.11975/j.issn.1002-6819.2022.15.022

    基于改进残差网络的橡胶林卫星影像语义分割方法

    Semantic segmentation method for rubber satellite images based on improved residual networks

    • 摘要: 为进一步提升现有基于残差的分割模型在测试集上的信息提取能力和验证改进残差优化策略普适性及实现橡胶卫星影像的更优分割,该研究提出了一种通用改进残差策略,以哨兵-2多光谱卫星影像为数据源构建数据集,并使用改进后残差网络ResNet50_ve作为OCRNet模型的骨干网络,实现基于变种残差网络的OCRNet模型(ResNet-ve-OCRNet),使用在ImageNet1k分类数据集上蒸馏好的学生模型作为预训练模型参与ResNet-ve-OCRNet模型的训练。研究结果表明使用层数中等的基于50层残差网络在小尺度卫星影像训练集上各指标收敛效果优于较深层数的101层残差网络,与DeeplabV3、DeeplabV3+、PSPNet模型相比,以ResNet50_ve为骨干网络的OCRNet在验证集上的平均交并比达到0.85,像素准确率达到97.87%,卡帕系数达到0.90。该研究提出的改进残差策略具有一定的普适性可应用到众多主流分割模型上且有评价指标性能增益,从预测图来看,基于改进残差网络(ResNet-ve)的模型抑制了在测试集预测图上的上下文信息缺失问题,能够实现橡胶林卫星影像的更优精确分割。

       

      Abstract: Rubber has been one of the most important cash crops in recent years. It is of great practical significance to segment the satellite images of rubber plantations using deep learning for agricultural refinement. In this study, a novel strategy was proposed to improve the residual network and its variant (ResNet-ve) for the segmentation. The study area was taken as the Rubber Plantation in Xuwen County, Zhanjiang City, Guangdong Province of China. The dataset was constructed using the Sentinel-2 multispectral satellite images as the data source. The OCRNet was used to incorporate an improved residual network. Inspired by Deeper Bottleneck Architectures proposed by Kaiming He, the modification strategy was established to modify path B in the Down Sampling module of each stage in the ResNet_vd middle layer. Specifically, the mean pooling module with 2×2 steps of 1 was replaced with a most-valued pooling module with 2×2 steps of 1, and then to add a 1×1 convolution before (called Deeper Bottleneck Pooling Architectures-like). The same modification strategy was applied to the other residual modules of the same stage, after which these modules were sequentially cascaded to form the improved stage. After that, the activation function was modified into the PReLU function to compare the network performance of the backbone network using the improved ResNet_ve. The improved residual network ResNet50_ve and basic ResNet50_vd network were used as the backbone networks of the four models. Among them, the student model was obtained to distillate the ResNet50_vd on ImageNet1k classification dataset using migration learning. A pre-trained model was then injected into the network training weight parameter for the modified ResNet_ve backbone network and ResNet_vd baseline backbone network to start the four networks. The results show that the ResNet50_vd network with the medium number of layers converged better than the ResNe101_vd network with the deeper layers on the training set of small-scale satellite images, and the OCRNet network on ResNet50_vd outperformed the DeeplabV3, DeeplabV3+, and PSPNet networks in all aspects. The OCRNet network with ResNet50_vd was used as a baseline for the subsequent experiments. The OCRNet with ResNet50_ve as the backbone network was achieved in the mIoU of 0.85, pixel accuracy of 97.87%, and a Kappa coefficient of 0.90 on the validation set. Therefore, an OCRNet with ResNet50_ve as the backbone network presented the best fineness of the internal boundary of the prediction graph among the four networks. There were also the least amount of time resources and the least number of parameters among the four networks. The OCRNet with the ResNet_ve as the backbone network was increased by 0.01 in the mIoU, and 0.01 in the Kappa coefficient, compared with the OCRNet with the ResNet_vd as the backbone network. By contrast, the accuracy metrics of the other three networks cannot be improved much using the ResNet_ve as the backbone network. The other three networks only improved the index data, in terms of the Kappa coefficient and mIoU index. Among them, the most obvious improvement was achieved in the DeepLabV3p. The OCRNet model with the improved residual network used the contextual and the deepest pixel features for the weighted splicing without the contextual information loss, while explicitly enhancing the pixel contributions from the same class of objects. As such, the background noise cannot be introduced, when extracting the multi-scale information. Thus, better performance was achieved in the accurate extraction of rubber distribution.

       

    /

    返回文章
    返回