Guo Yi, Huang Jiaxin, Deng Boqi, Liu Yangcheng. Semantic segmentation of the fish bodies in real environment using improved Mask-RCNN model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(23): 162-169. DOI: 10.11975/j.issn.1002-6819.2022.23.017
    Citation: Guo Yi, Huang Jiaxin, Deng Boqi, Liu Yangcheng. Semantic segmentation of the fish bodies in real environment using improved Mask-RCNN model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(23): 162-169. DOI: 10.11975/j.issn.1002-6819.2022.23.017

    Semantic segmentation of the fish bodies in real environment using improved Mask-RCNN model

    • The semantic segmentation of fish bodies is the basis to realize the three-dimensional modeling and semantic point clouds, as well as the calculation of the growth information of fish. The accuracy of point cloud computing depends mainly on the precision of fish body segmentation. However, the long path of feature fusion in the traditional Mask R-CNN network can result in the low-level information containing the accurate location of the target failing to be fully used. In addition, the noise (such as light and water quality) can pose a great impact on the collected images in the real breeding environment, leading to quality degradation. The fish feature cannot be fully extracted for better edge segmentation using the traditional network. In this study, an improved Mask R-CNN model was proposed to combine the SimAM attention mechanism, in order to improve the precision of fish semantic segmentation in complex environments. Twice-transfer learning was also conducted during the training process. An attention mechanism was added at each layer of the residual network in the backbone network. The extracted features were dynamically assigned the weights, so that the improved network was utilized to focus on the information that related to the fish body, while maintaining the lightweight feature of the model. The first transfer learning was conducted to train the pre-trained model of COCO dataset on the Open Images DatasetV6 fish images, followed by the second transfer learning on the self-built dataset. Among them, the self-built dataset was the frame splitting of the captured video using a ZED binocular camera in the real culturing environment. The images in the self-built dataset shared the features of a lot of noise and complex backgrounds. There were similar feature spaces in the fish images from the self-built dataset and the Open Images Dataset V6. As such, the features with high clarity and less noise were conducive to the network learning the texture and detail information of the fish body. Twice-transfer learning was also used to alleviate the noise in the images from the two datasets with similar feature spaces. Experiments on the test set of the self-built dataset show that the IoU, F1, precision, and recall rates of the improved model were 93.82%, 96.04%, 96.98%, and 95.12%, respectively. A series of comparative experiments were conducted to verify the effectiveness of the improved model. The experimental results show that the segmentation performance of SA1-Mask R-CNN was better than that of SegNet and U-Net++. In contrast to the Mask R-CNN1, the IoU was improved by 8.51 percentage points, the precision was improved by 8.8 percentage points, the recall rate increased by 9.18 percentage points, and F1 was improved by 8.99 percentage points. Compared with the SE- and CBAM-Mask R-CNN, the IoU increased by 1.79 and 0.33 percentage points, the precision increased by 1.44 and 0.25 percentage points, the recall increased by 2.59 and 0.51 percentage points, F1 increased by 2.03 and 0.38 percentage points, respectively. Meanwhile, the number of model parameters decreased by 4.7 and 5 MB, respectively. Furthermore, two training methods were compared to verify the effectiveness of twice-transfer learning. It was found that the SA2-Mask R-CNN improved the IoU, precision, recall, and F1 by 0.67, 0.82, 0.27, and 0.54 percentage points, compared with SA1-Mask R-CNN. In summary, the improved model can be expected to improve the precision of fish semantic segmentation without increasing the number of model parameters, indicating the excellent deployment and porting of the model. At the same time, the precision of twice-transfer learning improved the semantic segmentation of fish bodies. The findings can provide a strong reference for the cloud computing of fish body points.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return