采用改进YOLOv8-Seg推理复杂堆叠场景青花椒枝抓取次序

    Inferring the grasping sequence of prickly ash branches in complex stacked scenarios using an improved YOLOv8-Seg

    • 摘要: 南方青花椒普遍采用“下桩”采摘方案,枝剪后的花椒枝(含枝、果、叶等)形成复杂的堆叠场景。针对该堆叠场景,“下桩”采摘的花椒采摘机存在花椒枝识别、定位困难以及抓取效率低所导致的自动化程度不高等问题,该研究提出一种花椒枝抓取次序推理方法。对YOLOv8-Seg模型进行改进,在主干网络中P4和P5特征层对应的C2f模块特征拼接前,引入CBAM(convolutional block attention module)模块。通过自适应调整特征图的关注权重,增强模型对多尺度信息的感知能力,以提高模型对不同空间位置目标的特征整合能力。其次,用ASPP(atrous spatial pyramid pooling)模块替换了主干网络中的SPPF(spatial pyramid pooling-fast)模块,以提升模型在局部与全局尺度上的特征表达能力。最后,基于花椒枝复杂的堆叠场景,该研究提出一种抓取评分函数,并基于贝叶斯优化方法获得优化后的距离权重系数、掩膜完整性权重系数和牵连风险性权重系数分别为0.7970.1830.020,然后结合深度信息计算抓取分数,推理出最优抓取次序。试验结果表明,改进模型平均交并比和平均像素准确率指标分别达到了86.68%和91.04%。精确率、召回率和综合评价指标分数分别达到了95.70%、91.04%和92.82%。相较于原模型,分别提升了9.74、9.44和4.99个百分点。该研究提出的抓取次序推理方法可应用于智能花椒采摘机的抓取次序推理,为自动化花椒采摘设备的设计和优化提供重要参考。

       

      Abstract: Pile picking is commonly used for green Sichuan pepper in southern China. Complex stacking scenarios can be formed by the pruned prickly ash branches, including branches, fruits, and leaves. Such stacking scenes have limited the high level of automation. However, the existing harvesters cannot fully meet the large-scale production during pile picking. It is often required to recognize and locate the prickly ash branches for the high grasping efficiency. In this study, a grasping sequence reasoning was proposed using an improved YOLOv8-Seg network. The network structure was optimized to enhance the perception and integration of the multi-scale features. Specifically, a convolutional block attention module (CBAM) was embedded before the feature concatenation in the C2f modules, corresponding to the P4 and P5 layers of the backbone network. The attention weights of feature maps were adjusted adaptively. Some features of the targets were integrated to strengthen at different spatial positions. Meanwhile, the original spatial pyramid pooling-fast (SPPF) module was replaced by an atrous spatial pyramid pooling (ASPP) module. The network was also reinforced to represent both local and global contextual features. The higher precision and robustness were also achieved in segmenting the occluded targets. A grasping score function was further developed for the complex stacking of the prickly ash branches. Three key factors were considered, including the branch-to-camera distance, mask completeness, and the entanglement risk between neighboring branches. A Bayesian optimization approach was also applied to determine the optimal weight coefficients of these factors, which were 0.797, 0.183, and 0.020, respectively. These coefficients were integrated with the depth information. The grasping score was computed to infer the optimal grasping sequence and then efficiently prioritize among stacked branches. Experimental results showed that the improved model significantly enhanced the performance of the branch recognition and grasping sequence under various stacking conditions. The mean intersection over union (mIoU) and mean pixel accuracy (mPA) reached 86.68% and 91.04%, respectively. The precision, recall, and F1-score were 95.70%, 91.04%, and 92.82%, respectively, indicating an increase of 9.74%, 9.44%, and 4.99%, respectively, compared with the original model. Furthermore, the superior performance was achieved in the segmentation accuracy, boundary recognition, and robustness against occlusion, compared with the mainstream instance segmentation models, such as the Mask R-CNN, YOLACT, and YOLOv5. Grasping experiments were conducted on the practical harvesting operations in order to verify the effectiveness of the improved model. An AUBO-i10 robotic arm was equipped with a two-finger gripper and an Intel RealSense D435i depth camera in an eye-in-hand configuration. The robotic system successfully performed the detection, recognition, reasoning, and grasping of the prickly ash branches. The grasping success rate reached 75.86%, and the sequence reasoning accuracy was 86.21%. The feasibility and stability of the approach were obtained in the complex stacking scenarios. The reasoning strategy can be effectively applied to the grasping sequence inference for the intelligent prickly ash harvesters. The finding can also provide important references to optimize the automatic harvesting for green Sichuan pepper.

       

    /

    返回文章
    返回