基于深度强化学习和动态融合感知的农业机器人SLAM算法

苗中华; 祝如江; 张伟; 韩科立; 陈昊业; 戴洁; 郑戈

doi:10.11975/j.issn.1002-6819.202511150

基于深度强化学习和动态融合感知的农业机器人SLAM算法

SLAM algorithm for agricultural robots based on deep reinforcement learning and dynamic fusion perception

摘要

摘要: 传统的农业机器人定位与建图通常依赖固定的多传感器融合策略，然而传感器易受场景状态改变而精度退化，导致定位和建图失稳失准。针对此问题，该研究提出基于深度强化学习和动态融合感知的农业机器人同步定位与建图（simultaneous localization and mapping, SLAM）算法。其前端包括3个部分：视觉-惯性模块融合视觉里程计和惯性测量单元（inertial measurement unit，IMU）数据，优化视觉重投影误差和IMU测量误差；激光雷达-惯性模块提取点云特征并与IMU数据联合优化，提供稳定的位姿估计；自适应权重调整模块将视觉特征统计量与点云几何特征作为状态输入，基于双延迟深度确定性策略梯度（twin delayed deep deterministic policy gradient，TD3）算法在线学习并输出传感器融合权重，直接作用于因子图残差加权项，实现传感器贡献的自适应分配。后端构建了因子图优化模型以联合优化观测数据，实现最大后验位姿估计和地图构建。最后，分别在公开数据集和复杂果园场景中开展试验验证。试验结果表明，所提方法在复杂果园中定位的最大误差与均值误差分别为1.583m与0.803m，位姿估计精度优于主流对比算法，证明了所提算法在复杂农业应用场景中的有效性。

Abstract: Traditional agricultural robot localization and mapping typically relied on fixed multi-sensor fusion strategies; however, sensors exhibited significant scene dependency: during multi-scenario operations, sensors such as visual cameras, light detection and ranging (LiDAR), and inertial measurement units (IMU) were prone to degradation, which subsequently led to frontend estimation instability and backend optimization error accumulation, resulting in localization and mapping instability and inaccuracy.This study proposed an agricultural robot SLAM algorithm based on deep reinforcement learning and dynamic fusion perception. The algorithm frontend comprised three modules: the visual-inertial module, the LiDAR-inertial module, and the adaptive weight adjustment module. The visual-inertial module utilized image feature extraction and optical flow tracking to obtain feature correspondences, combined with IMU pre-integration constraints, minimizing visual reprojection errors and inertial measurement errors within a sliding window to achieve continuous visual-inertial state estimation. To further improve system robustness in weak-texture and low-illumination environments, this module introduced LiDAR depth information for depth compensation and filtering of feature points when necessary, thereby enhancing scale observability and initialization robustness. Additionally, it could provide initial pose estimation for the LiDAR-inertial module during system initialization and assist in feature matching and global optimization during loop closure detection. The LiDAR-inertial module was responsible for processing IMU-undistorted point cloud data, extracting edge features and planar features, and performing feature matching with the local map maintained in the sliding window to construct LiDAR constraint factors. IMU data was used not only for point cloud undistortion but also provided initial value estimation for scan matching, improving matching convergence speed and accuracy. When visual information degraded or failed, the LiDAR-inertial module could operate independently to maintain the system's pose tracking capability. The adaptive weight adjustment module constructed an Actor-Critic framework based on the twin delayed deep deterministic policy gradient (TD3) algorithm to achieve dynamic weighting of visual and LiDAR factors in the factor graph. This module took the visual feature observation matrix and LiDAR geometric feature statistics as state inputs, employed the reciprocal of relative pose error as the reward function, and implemented the reinforcement learning process through alternating updates of the Actor and Critic. The Actor output continuous weights for camera factors and LiDAR factors based on input states, and applied them as actions directly to the weighting terms of factor graph residuals, thereby completing the adaptive fusion of visual and LiDAR constraints. The algorithm backend constructed a factor graph optimization model, where the laser constraints from the LiDAR-inertial module, the visual and inertial constraints from the visual-inertial module, and the loop detection factors were jointly input into the factor graph for global optimization. This achieved tight coupling fusion of multi-source information, enabling maximum a posteriori pose estimation and dense 3D environment map construction. Localization accuracy experiments in complex orchard environments demonstrated that: the proposed method achieved a maximum error of 1.583 m, a mean error of 0.803 m, a root mean square error (RMSE) of 0.830 m, and a standard deviation of 0.212 m, while tightly-coupled lidar-visual-inertial odometry via smoothing and mapping (LVI-SAM) exhibited maximum error, mean error, RMSE, and standard deviation of 3.764 m, 1.540 m, 1.732 m, and 0.792 m respectively in complex orchard environments. The pose estimation accuracy of the proposed algorithm outperformed mainstream comparison algorithms, demonstrating the effectiveness of this method in complex agricultural application scenarios.

HTML全文

参考文献(22)

施引文献

资源附件(0)