Locating the pile picking points for green pepper branches using improved U-Net and RGB-D images
-
Graphical Abstract
-
Abstract
Pile picking has been widely used for harvesting green Sichuan pepper branches. In this targeted pruning, the fruit-bearing branches are selectively cut to preserve the short stumps of a specific length. Part of the branch structure can be maintained to support the subsequent growth in the overall health of the plant for harvesting efficiency. This study aims to accurately recognize the branches and then determine the optimal cutting points for the efficient short-stump cutting under the complex field environments with dense foliage and varying illumination. A Sichuan pepper harvesting robot was also selected as the research subject. A systematic investigation was then made to locate the short-stump cutting points on the main branches of the green Sichuan pepper using the U-Net deep learning network and RGB-D depth camera. Semantic segmentation was integrated with the branch identification with the depth information for the spatial localization. A complete processing pipeline was established from the image acquisition to the cutting point coordinates. Firstly, the traditional U-Net model was improved to replace its backbone network with ResNet50 that embedded with a Coordinate Attention (CA) mechanism. Spatially fine-grained features were captured to enhance both the boundary completeness and segmentation precision of the branch structures. The Squeeze-and-Excitation (SE) attention mechanism was added to the feature splicing stage of the U-Net model. Channel-wise feature responses were also recalibrated adaptively. Thereby, a robust segmentation model was constructed for the main branches and trunk of the Sichuan pepper. Target structures were effectively distinguished from the complex backgrounds, including the leaves, fruits, and interfering branches. Secondly, the segmented images of the main branches and trunk were binarized using the Zhang-Suen algorithm. The centerline of the main branches was extracted to integrate the depth information from the RGB-D camera with OpenCV image processing. The pixel length in the pixel coordinate was converted into the physical length in the physical coordinates using camera intrinsic parameters, including the focal length and pixel size. The actual length in the world coordinate was then transformed to incorporate the depth measurements from the RGB-D camera. Spatial geometric transformations were applied to the accurate coordinate mappings. The length mapping between the world and the pixel coordinate system was achieved through the accurate metric-scale measurements of the branch dimensions in three-dimensional space. The 40 mm stump length was determined. The predefined short stake length of 40 mm was then accurately mapped from the world coordinate system to the pixel scale in the RGB images. Finally, the mapping was obtained for the quantitative correspondence between the physical spatial length and the image pixel dimension. The optimal pruning points on each main branch were precisely located within the image plane. Experimental results demonstrate that the improved U-Net model exhibited superior segmentation performance compared with the rest advanced semantic segmentation models, such as the DeepLabV3+ and PSPNet. Specifically, there were the Mean Intersection over Union (MIoU) of 87.58%, the mean Pixel Accuracy (mPA) of 93.76%, and the Recall rate of 96.24%, indicating its robustness and effectiveness in accurately identifying and segmenting target features within the image data. Furthermore, the success rates to identify and locate the pruning points were 90.81% in direct light, 84.88% in backlight conditions, and 80.52% in cloudy conditions. At the same time, the localization success rate was 90%, and the average identification of a single branch took 1.93 s in the cutting point localization. This finding can also provide the technical support for the pile picking of the green Sichuan pepper harvesting using picking robots.
-
-