Cone-based camera-radar fusion for 3D object detection in agricultural scenes
-
Graphical Abstract
-
Abstract
Safety is a primary consideration for autonomous driving of agricultural machinery, and perception system performance is crucial, which is still in its early stages of research. One problem is that there are few perception datasets specifically designed for agricultural scenarios since they are very different from the typical urban scenarios that predominate in present studies. Another reason is that, in contrast to urban examples, agricultural applications must contend with more stringent budget acceptance and tough working circumstances, which puts more demands on perception sensors and algorithms. To address the problem of 3D object detection for the autonomous driving of agricultural machinery in agricultural scenarios, we proposed a low-cost, two stage detection based perception system that combines millimeter-wave radar and a monocular camera. First, a multimodal perception dataset of agricultural scenes was created, incorporating LiDAR (Light Detection and Ranging), INS (Inertial Navigation System), camera, and millimeter-wave radar data, which is with a hardware-level data synchronization and target-level data annotation. Then the middle fusion strategy was used to build a neural network model known as CFPNet. After preliminary detection of the target with the improved center point detection network, the result was compared to the radar detection point, and the radar feature extraction module was used to extract the radar point cloud features in the frustum region of interest to supplement the image features. Finally, the image's preliminary detection information and radar feature were combined to perform a secondary detection, and the depth, direction, velocity, and other 3D object attributes were regressed simultaneously. The results show that CFPNet's mAP (mean Average Precsion) on the self-built agricultural perception multimodal dataset is 86.5%, which is 5.5% higher than the baseline, and the mATE (mean Average Translation Error) is 0.197m lower than the baseline. An additional experiment on small object detection was conducted to test the effectiveness of CFPNet's improvements in detecting small objects. The results showed that CFPNet achieved a recall rate of 1 for the selected small objects, higher than the 0.3 before the improvement, confirming the enhanced detection capability. Deployment experiments were conducted to test the applicability of CFPNet. CFPNet achieved a frame rate of 7.4 frames per second in low-computing agricultural scenarios, which is 211% of the baseline. This confirms the applicability of CFPNet in agricultural scenarios. Experiments on public datasets were conducted to test the applicability of CFPNet in other scenarios. The results indicate that CFPNet achieved favorable performance on the NuScenes public dataset, with mATE, mASE, and mAVE of 0.792 m, 0.236, and 0.52 m/s, respectively. However, since CFPNet is an algorithm designed for monocular cameras, its mAP lagged behind. Furthermore, CFPNet can directly provide the speed information of the target without needing to associate the detection results of the preceding and following frames. This study provides a feasible solution and technical support for 3D object detection in agricultural scenarios, especially in scenarios with low computing power.
-
-