Abstract:
Phalaenopsis is renowned for its butterfly-like flowers. It is highly favored and holds promising market prospects at present. The rapid propagation of phalaenopsis primarily relies on tissue culture technology. However, this process is characterized by strong dependence on manual labor, highly repetitive operations, and low production efficiency. Therefore, it is imperative to automate the tissue culture of phalaenopsis. It is also crucial to accurately detect the gripping point for the transplantation operation in tissue culture. Particularly, the mask center can better represent the gripping point compared with the bounding box center. In this study, an improved YOLOv11n-seg-based segmentation model was proposed for the tissue-cultured seedling, in order to improve the segmentation accuracy of the gripping area and facilitate deployment by reducing the computational complexity. Firstly, the backbone feature extraction network of the original model was replaced with the RepViT network. Its reparameterization structure and modular decoupling enhanced feature extraction while lowering complexity. Secondly, a lightweight cross-scale feature fusion module (CCFM) was introduced into the neck network of the original model. The progressive fusion of adjacent scales was achieved through the layer-by-layer stacked fusion blocks. The CCFM effectively avoided the high computational overhead and improved the model's detection capability for small and medium-sized targets. Meanwhile, to improve the annotation efficiency on the tissue-cultured seedling segmentation dataset, an automatic annotation method based on Grounded SAM 2 was proposed. Specifically, the outputs of Grounded SAM 2 were processed using AddSub algorithm, including dynamic area threshold denoising, morphological operations, and mask difference fusion operations. A series of experiments were conducted on a phalaenopsis tissue-cultured seedling image dataset consisting of
1000 training images and 172 validation images. Experimental results showed that the improved model can accurately locate the gripping point of the tissue-cultured seedling, with an average Euclidean distance of only 1.95 mm for the mask centroid of the gripping region between the model and the manually annotated. The improved model achieved precision, recall, mAP
50, and mAP
50:95 of 96.0%, 81.8%, 87.7%, and 67.2%, respectively, indicating the improvement of 0.6, 2.8, 3.3, and 8.5 percentage points, compared with the original model YOLOv11n-seg. The parameter and FLOPs were reduced by 46.7% and 20.6%, respectively, and the size of the improved model was only 3.8 MB,achieving model lightweighting while enhancing segmentation accuracy. The automatic annotation was successfully realized on
1172 tissue-cultured seedling images. A high level of consistency was found with the manual annotation from Labelme and ISAT, with an annotation success rate of 84.5%, fully meeting the practical requirement. Additionally, the average annotation time per image was 6.6 s, which was 176.6 and 50.0 s faster than manual annotation using Labelme and ISAT, respectively. The efficiency of building the training dataset was significantly improved for the phalaenopsis tissue-cultured seedling. The gripping point was effectively extracted for the tissue-cultured seedling, and the tissue-cultured seedling images can be automatically annotated. The finding can provide valuable technical reference for the automation of the tissue culture.