改进Mask R-CNN的真实环境下鱼体语义分割

郭奕; 黄佳芯; 邓博奇; 刘洋成

doi:10.11975/j.issn.1002-6819.2022.23.017

摘要: 鱼体语义分割是实现鱼体三维建模和语义点云、计算鱼体生长信息的基础。为了提高真实复杂环境下鱼体语义分割精度，该研究提出了SA-Mask R-CNN模型，即融合SimAM注意力机制的Mask R-CNN。在残差网络的每一层引入注意力机制，利用能量函数为每一个神经元分配三维权重，以加强对鱼体关键特征的提取；使用二次迁移学习方法对模型进行训练，即首先利用COCO数据集预训练模型在Open Images DatasetV6鱼类图像数据集完成第一次迁移学习，然后在自建数据集上完成第二次迁移学习，利用具有相似特征空间的2个数据集进行迁移学习，在一定程度上缓解了图像质量不佳的情况下鱼体语义分割精度不高的问题。在具有真实养殖环境特点的自建数据集上进行性能测试，结果表明，SA-Mask R-CNN网络结合二次迁移学习方法的交并比达93.82%，综合评价指标达96.04%，分割效果优于SegNet和U-Net++，较引入SENet和CBAM（Convolutional Block Attention Module, CBAM）注意力模块的Mask R-CNN交并比分别提升了2.46和1.0个百分点，综合评价指标分别提升了2.57和0.92个百分点，模型参数量分别减小了4.7和5 MB。研究结果可为鱼体点云计算提供参考。

Abstract: The semantic segmentation of fish bodies is the basis to realize the three-dimensional modeling and semantic point clouds, as well as the calculation of the growth information of fish. The accuracy of point cloud computing depends mainly on the precision of fish body segmentation. However, the long path of feature fusion in the traditional Mask R-CNN network can result in the low-level information containing the accurate location of the target failing to be fully used. In addition, the noise (such as light and water quality) can pose a great impact on the collected images in the real breeding environment, leading to quality degradation. The fish feature cannot be fully extracted for better edge segmentation using the traditional network. In this study, an improved Mask R-CNN model was proposed to combine the SimAM attention mechanism, in order to improve the precision of fish semantic segmentation in complex environments. Twice-transfer learning was also conducted during the training process. An attention mechanism was added at each layer of the residual network in the backbone network. The extracted features were dynamically assigned the weights, so that the improved network was utilized to focus on the information that related to the fish body, while maintaining the lightweight feature of the model. The first transfer learning was conducted to train the pre-trained model of COCO dataset on the Open Images DatasetV6 fish images, followed by the second transfer learning on the self-built dataset. Among them, the self-built dataset was the frame splitting of the captured video using a ZED binocular camera in the real culturing environment. The images in the self-built dataset shared the features of a lot of noise and complex backgrounds. There were similar feature spaces in the fish images from the self-built dataset and the Open Images Dataset V6. As such, the features with high clarity and less noise were conducive to the network learning the texture and detail information of the fish body. Twice-transfer learning was also used to alleviate the noise in the images from the two datasets with similar feature spaces. Experiments on the test set of the self-built dataset show that the IoU, F1, precision, and recall rates of the improved model were 93.82%, 96.04%, 96.98%, and 95.12%, respectively. A series of comparative experiments were conducted to verify the effectiveness of the improved model. The experimental results show that the segmentation performance of SA1-Mask R-CNN was better than that of SegNet and U-Net++. In contrast to the Mask R-CNN1, the IoU was improved by 8.51 percentage points, the precision was improved by 8.8 percentage points, the recall rate increased by 9.18 percentage points, and F1 was improved by 8.99 percentage points. Compared with the SE- and CBAM-Mask R-CNN, the IoU increased by 1.79 and 0.33 percentage points, the precision increased by 1.44 and 0.25 percentage points, the recall increased by 2.59 and 0.51 percentage points, F1 increased by 2.03 and 0.38 percentage points, respectively. Meanwhile, the number of model parameters decreased by 4.7 and 5 MB, respectively. Furthermore, two training methods were compared to verify the effectiveness of twice-transfer learning. It was found that the SA2-Mask R-CNN improved the IoU, precision, recall, and F1 by 0.67, 0.82, 0.27, and 0.54 percentage points, compared with SA1-Mask R-CNN. In summary, the improved model can be expected to improve the precision of fish semantic segmentation without increasing the number of model parameters, indicating the excellent deployment and porting of the model. At the same time, the precision of twice-transfer learning improved the semantic segmentation of fish bodies. The findings can provide a strong reference for the cloud computing of fish body points.

改进Mask R-CNN的真实环境下鱼体语义分割

Semantic segmentation of the fish bodies in real environment using improved Mask-RCNN model