基于改进U-Net与RGB-D图像的青花椒枝条“下桩”剪切点定位

    Locating the pile picking points for green pepper branches using improved U-Net and RGB-D images

    • 摘要: 青花椒枝条“下桩”是通过剪下带鲜果的枝条并保留一定长度短桩的采摘收获方法。为实现青花椒采摘机器人精准识别枝条并确定最佳剪切点以达到高效“下桩”作业,该研究提出了一种基于U-Net深度学习网络和RGB-D相机相结合的青花椒主枝“下桩”剪切点定位方法。首先,通过改进传统U-Net模型,将其主干网络替换为嵌入CA注意力机制的ResNet50网络,同时在U-Net模型的特征拼接阶段中增加SE注意力机制,从而构建针对青花椒主枝和树干的分割模型。然后,将分割后的图像利用二值化与骨架线提取方法得到主枝中心线,结合RGB-D相机的深度信息与OpenCV图像处理算法,完成世界坐标系与像素坐标系间长度的映射。随后,将短桩预设的40mm长度从世界坐标系映射至RGB图像中的像素长度,最终确定每根主枝的“下桩”剪切点位置。试验结果表明,改进后的U-Net模型在分割性能上优于DeeplabV3+和PSPNet,平均交并比(MIoU)、平均像素准确率(mPA)和召回率(recall)分别达到87.58%、93.76%和96.24%。在晴天顺光、逆光及阴天条件下,“下桩”剪切点识别定位的成功率分别达到90.81%、84.88%、80.52%。采摘点定位试验中,定位成功率为90%,单根花椒枝平均识别过程耗时1.93 s。该研究结果可为青花椒采摘机器人“下桩”采收提供技术支撑。

       

      Abstract: Pile picking has been widely used for harvesting green Sichuan pepper branches. In this targeted pruning, the fruit-bearing branches are selectively cut to preserve the short stumps of a specific length. Part of the branch structure can be maintained to support the subsequent growth in the overall health of the plant for harvesting efficiency. This study aims to accurately recognize the branches and then determine the optimal cutting points for the efficient short-stump cutting under the complex field environments with dense foliage and varying illumination. A Sichuan pepper harvesting robot was also selected as the research subject. A systematic investigation was then made to locate the short-stump cutting points on the main branches of the green Sichuan pepper using the U-Net deep learning network and RGB-D depth camera. Semantic segmentation was integrated with the branch identification with the depth information for the spatial localization. A complete processing pipeline was established from the image acquisition to the cutting point coordinates. Firstly, the traditional U-Net model was improved to replace its backbone network with ResNet50 that embedded with a Coordinate Attention (CA) mechanism. Spatially fine-grained features were captured to enhance both the boundary completeness and segmentation precision of the branch structures. The Squeeze-and-Excitation (SE) attention mechanism was added to the feature splicing stage of the U-Net model. Channel-wise feature responses were also recalibrated adaptively. Thereby, a robust segmentation model was constructed for the main branches and trunk of the Sichuan pepper. Target structures were effectively distinguished from the complex backgrounds, including the leaves, fruits, and interfering branches. Secondly, the segmented images of the main branches and trunk were binarized using the Zhang-Suen algorithm. The centerline of the main branches was extracted to integrate the depth information from the RGB-D camera with OpenCV image processing. The pixel length in the pixel coordinate was converted into the physical length in the physical coordinates using camera intrinsic parameters, including the focal length and pixel size. The actual length in the world coordinate was then transformed to incorporate the depth measurements from the RGB-D camera. Spatial geometric transformations were applied to the accurate coordinate mappings. The length mapping between the world and the pixel coordinate system was achieved through the accurate metric-scale measurements of the branch dimensions in three-dimensional space. The 40 mm stump length was determined. The predefined short stake length of 40 mm was then accurately mapped from the world coordinate system to the pixel scale in the RGB images. Finally, the mapping was obtained for the quantitative correspondence between the physical spatial length and the image pixel dimension. The optimal pruning points on each main branch were precisely located within the image plane. Experimental results demonstrate that the improved U-Net model exhibited superior segmentation performance compared with the rest advanced semantic segmentation models, such as the DeepLabV3+ and PSPNet. Specifically, there were the Mean Intersection over Union (MIoU) of 87.58%, the mean Pixel Accuracy (mPA) of 93.76%, and the Recall rate of 96.24%, indicating its robustness and effectiveness in accurately identifying and segmenting target features within the image data. Furthermore, the success rates to identify and locate the pruning points were 90.81% in direct light, 84.88% in backlight conditions, and 80.52% in cloudy conditions. At the same time, the localization success rate was 90%, and the average identification of a single branch took 1.93 s in the cutting point localization. This finding can also provide the technical support for the pile picking of the green Sichuan pepper harvesting using picking robots.

       

    /

    返回文章
    返回