基于改进VoteNet的三维点云苗圃树冠检测方法

    Detecting nursery canopy from 3D point cloud using an improved VoteNet model

    • 摘要: 针对果园机器人对小体积目标检测精度较低的问题,该研究提出一种改进VoteNet的三维点云苗圃树冠检测方法。通过将VoteNet非对称特征提取模块替换为对称化特征提取模块(symmetric feature extraction module,SFEM)以增强低维信息获取,在跳跃连接处引入反向注意力特征融合模块(reverse attention feature fusion module,RAFFM)以提升局部特征提取能力,并采用中心点差异损失函数(center point discrepancy loss,CPD-Loss)减小预测框与真实框中心点偏差。在自制苗圃点云数据集上的试验表明,改进模型在IoU阈值为0.25时平均召回率(average recall,AR)和平均精度均值(mean average precision,mAP)分别达到88.06%和55.05%;其中小目标无刺枸骨球的检测AP值从4.40%提高至28.77%。该研究方法能够提升果园机器人在苗圃环境中对树冠目标的感知能力,为智能化苗圃管理提供技术支持。

       

      Abstract: Accurate detection of small-volume targets, such as tree crowns and pedestrians, remains a significant challenge for the orchard robots in complex nursery environments. The high performance of detection is often required for agricultural settings. In this study, an enhanced version of the VoteNet model was proposed for the object detection of 3D point clouds. Three major modifications were utilized, according to the original VoteNet architecture. Firstly, the original module of asymmetric feature extraction was replaced by a symmetric feature extraction module (SFEM). A U-Net-like encoder-decoder structure was also adopted to integrate the multi-scale features. The symmetric design was used to more effectively fuse the low-level spatial and high-level semantic information. Thereby, the finer geometric features were preserved to detect the small objects. Secondly, a reverse attention feature fusion module (RAFFM) was introduced at the skip connections. The local feature representation was then enhanced using self-attention mechanisms and trilinear interpolation. The fine-grained structures were effectively highlighted to improve the feature consistency at the different scales. The network was focused on more discriminative regions of the small targets. Thirdly, a center point discrepancy loss (CPD-Loss) was incorporated to minimize the spatial offset between predicted proposal centers and ground-truth bounding box centers. The high accuracy was obtained for the object localization. This additional loss term regularized the voting process, leading to more stable cluster formation around object centroids. A series of experiments was conducted to validate the approach. A nursery point cloud dataset was constructed with three types of trees and pedestrians under 1147 scenes, according to the KITTI format. The dataset was partitioned into 60% for training, 20% for validation, and 20% for testing, in order to ensure fair evaluation. The good performance of the model was achieved, with an average recall (AR) of 88.06% and a mean average precision (mAP) of 55.05% at an IoU threshold of 0.25, which significantly outperformed the baseline VoteNet by 9.05% and 22.47% in AR and mAP, respectively. Particularly, the small objects were detected dramatically: The average precision (AP) for Ilex cornuta var fortunei increased from 4.40% to 28.77%, respectively, indicating a 6.5-fold enhancement. Each component was individually contributed after the ablation test. The SFEM provided more discriminative feature learning for the objects with sparse point distributions; The RAFFM effectively aggregated the contextual features to preserve the geometric details, leading to the higher recall for the pedestrians and small tree crowns; And the CPD-Loss further increased the accuracy of bounding box regression, in order to stabilize the training convergence. It was also compared with the traditional target detection network.The good performance was achieved under the nursery environments. The 3D detection framework can offer a robust perception solution for the autonomous orchard robots. More accurate environmental detection was also obtained in complex nursery settings. The detection facilitated the small-volume targets at various precision levels of the agriculture applications, such as targeted spraying, growth monitoring, and safe navigation around pedestrians. This work can contribute to the advancement of intelligent agricultural systems. The finding can provide a reliable 3D perception technology for efficient and autonomous nursery operations. Future research can optimize the network architecture for real-time performance, and then extend into the rest agricultural scenarios.

       

    /

    返回文章
    返回