基于Grounded SAM 2和改进YOLOv11n-seg的蝴蝶兰组培苗夹取点分析

    Grasping point analysis of phalaenopsis tissue-cultured seedlings using Grounded SAM 2 and improved YOLOv11n-seg

    • 摘要: 为解决传统目标检测方法在蝴蝶兰组织培养自动化移植过程中存在的夹取点定位不准确、模型复杂度高、难以在资源受限设备上部署等问题,该研究提出了一种基于改进YOLOv11n-seg的轻量化组培苗分割模型。首先,使用RepViT架构替换原模型的主干网络,在提高模型特征提取能力的同时降低计算需求。其次,在模型的颈部网络引入轻量级跨尺度特征融合模块(cross-scale feature fusion module,CCFM),进一步缩减计算开销,并增强模型对小目标特征的分割检测能力。同时,针对人工标注分割数据集效率低下的问题,提出了一种面向蝴蝶兰组培苗的自动标注方法。基于Grounded SAM 2,设计了一种名为AddSub的后处理算法,通过掩码差分融合运算、动态面积阈值降噪以及形态学运算等步骤对Grounded SAM 2的输出结果进行处理。试验结果表明,改进模型能够准确定位组培苗夹取点,其生成的夹取区域掩码质心与人工标注质心之间的平均欧氏距离仅为1.95 mm;且准确率、召回率、mAP50、mAP50:95分别达到96.0%、81.8%、87.7%、67.2%,相较基线模型YOLOv11n-seg分别提升了0.6、2.8、3.3、8.5个百分点;模型大小仅为3.8 MB,参数量和浮点计算量较原模型分别减少了1.32 M和2.1 G;研究提出的自动标注方法标注成功率达84.5%,单图平均标注时间为6.6 s,较Labelme与ISAT(image segmentation annotation tool)等人工标注方式分别减少了176.6和50.0 s,大幅降低了训练数据集的制作成本。研究结果可为蝴蝶兰组培过程的自动化实现提供参考。

       

      Abstract: Phalaenopsis is renowned for its butterfly-like flowers. It is highly favored and holds promising market prospects at present. The rapid propagation of phalaenopsis primarily relies on tissue culture technology. However, this process is characterized by strong dependence on manual labor, highly repetitive operations, and low production efficiency. Therefore, it is imperative to automate the tissue culture of phalaenopsis. It is also crucial to accurately detect the gripping point for the transplantation operation in tissue culture. Particularly, the mask center can better represent the gripping point compared with the bounding box center. In this study, an improved YOLOv11n-seg-based segmentation model was proposed for the tissue-cultured seedling, in order to improve the segmentation accuracy of the gripping area and facilitate deployment by reducing the computational complexity. Firstly, the backbone feature extraction network of the original model was replaced with the RepViT network. Its reparameterization structure and modular decoupling enhanced feature extraction while lowering complexity. Secondly, a lightweight cross-scale feature fusion module (CCFM) was introduced into the neck network of the original model. The progressive fusion of adjacent scales was achieved through the layer-by-layer stacked fusion blocks. The CCFM effectively avoided the high computational overhead and improved the model's detection capability for small and medium-sized targets. Meanwhile, to improve the annotation efficiency on the tissue-cultured seedling segmentation dataset, an automatic annotation method based on Grounded SAM 2 was proposed. Specifically, the outputs of Grounded SAM 2 were processed using AddSub algorithm, including dynamic area threshold denoising, morphological operations, and mask difference fusion operations. A series of experiments were conducted on a phalaenopsis tissue-cultured seedling image dataset consisting of 1000 training images and 172 validation images. Experimental results showed that the improved model can accurately locate the gripping point of the tissue-cultured seedling, with an average Euclidean distance of only 1.95 mm for the mask centroid of the gripping region between the model and the manually annotated. The improved model achieved precision, recall, mAP50, and mAP50:95 of 96.0%, 81.8%, 87.7%, and 67.2%, respectively, indicating the improvement of 0.6, 2.8, 3.3, and 8.5 percentage points, compared with the original model YOLOv11n-seg. The parameter and FLOPs were reduced by 46.7% and 20.6%, respectively, and the size of the improved model was only 3.8 MB,achieving model lightweighting while enhancing segmentation accuracy. The automatic annotation was successfully realized on 1172 tissue-cultured seedling images. A high level of consistency was found with the manual annotation from Labelme and ISAT, with an annotation success rate of 84.5%, fully meeting the practical requirement. Additionally, the average annotation time per image was 6.6 s, which was 176.6 and 50.0 s faster than manual annotation using Labelme and ISAT, respectively. The efficiency of building the training dataset was significantly improved for the phalaenopsis tissue-cultured seedling. The gripping point was effectively extracted for the tissue-cultured seedling, and the tissue-cultured seedling images can be automatically annotated. The finding can provide valuable technical reference for the automation of the tissue culture.

       

    /

    返回文章
    返回