Abstract:
Forest resource management is increasingly required for high precision and digitization in recent years. Alternatively, the unmanned aerial vehicle (UAV) technology has emerged as a promising potential for the intelligent and automated forest inventory. However, some challenges have hindered its adoption. For instance, there are the imprecise crown segmentation, limited accuracy in the single-tree volume estimation, and the high cost of the high-precision light detection and ranging (LiDAR) point cloud data. In this study, the novel single-tree volume estimation was developed using UAV-derived visible light imagery and low-density point cloud data. The precision of the crown segmentation was improved to integrate the multi-source features during volume estimation. A crown segmentation network (called CrownSeg) was introduced to utilize the UAV visible light imagery. The YOLOv11 framework was established with several specialized modules. Among them, the ScaleEdgeExtractor (SEE) module employed a three-stage mechanism—shallow filtering, edge enhancement, and cross-layer fusion—combining directional Sobel convolution, multi-scale downsampling, and adaptive edge-feature fusion, in order to effectively preserve and enhance crown boundary information. The gated feature pyramid Network (GatedFPN) adopted a bi-directional hierarchical structure with the spatial-channel dual-attention gating. The closed-loop multi-scale optimization and more refined crown segmentation were realized over different canopy densities. The C2BRA module introduced the bi-level routing attention and a channel-spatial dual-attention mechanism, in order to enhance the boundary perception while suppressing background interference from complex forest environments. Meanwhile, the DilatedFusion (DF) module was utilized to integrate the parallel dilated convolutions with the shared kernels, in order to extract the multi-granularity contextual information suitable for the trees with the various shapes and sizes. These modules worked collaboratively to enhance the spatial detail retention and semantic feature extraction, resulting in high-quality segmentation outputs. In volume estimation, the crown morphological, spectral, and textural features were extracted from the UAV imagery with the tree height data from the low-density LiDAR point clouds. A progressive feature combination and a weighted ensemble learning were employed to integrate these multi-source inputs for the robust prediction. The CrownSeg network was achieved in an Average Precision at an Intersection over Union threshold of 0.5 (AP50) of 94.9% and an AP50-95 of 66.2% over the baseline model 1.6 and 3.8 percentage points, respectively, indicating the boundary delineation and multi-scale feature representation. The weighted ensemble model of volume estimation yielded a coefficient of determination (
R²) of 0.921 5, a mean absolute error (MAE) of 0.022 8 cubic meters, and a mean absolute percentage error (MAPE) of 17.00%, compared with the standalone models. Comparative analysis showed that the morphological, spectral, and textural features significantly reduced the estimation errors, demonstrating the superior stability and generalization across diverse forest conditions. A series of experiments was carried out to validate the improved model. The experimental data were collected from 749 single trees in a plantation forest. The error metrics were consistently lower than those of individual algorithms, like Random Forest or Neural Networks. Visual inspections confirmed that the CrownSeg shared excellent performance on the complex canopy structures and segmentation in the dense or heterogeneous stands. Ultimately, a high-precision crown segmentation network and an accurate single-tree volume estimation model were established using UAV-based data. A cost-effective alternative approach can be expected to replace the traditional ground-based surveys. The finding can also provide a practical technical framework for the UAV remote sensing applications in precision forestry. Future efforts are suggested to explore the multi-modal data integration. The LiDAR and optical imagery can further refine the segmentation and estimation accuracy in varied forest environments.