Abstract:
This paper addresses the challenge of achieving accurate and real-time three-dimensional size estimation for strawberries in agricultural field environments. A significant practical limitation stems from operational constraints in ridge-planting systems, where automated vision systems are typically restricted to data capture from a single, oblique viewpoint. This results in the generation of only a partial three-dimensional point cloud, representing merely the visible upper surface of the fruit and inherently lacking complete geometric information. Concurrently, a fundamental modeling challenge exists because the natural morphology of a strawberry, characterized by a conical shape with a truncated calyx base and a tapered tip, deviates substantially from the simple symmetric shapes such as ellipsoids commonly used in conventional fitting algorithms. This geometric mismatch is a primary source of estimation inaccuracies, particularly for critical dimensions like the bottom diameter, which cannot be directly observed from the available data perspective. To overcome these interconnected challenges of data incompleteness and model inadequacy, this study proposes a novel, integrated computational framework that synergistically combines a lightweight, enhanced segmentation network with a purpose-specific geometric model designed to match the fruit's true morphology. The proposed methodology follows a streamlined, four-stage pipeline. The process begins with robust instance segmentation performed by an improved YOLOv8s-seg network. Its backbone is augmented with a custom C2f_Faster_EMA module to enhance multi-scale feature extraction for fine details with minimal computational overhead. The resulting high-precision mask is fused with synchronized depth data to generate an initial 3D point cloud. This cloud then undergoes critical preprocessing involving statistical outlier removal and voxel grid downsampling. This step efficiently condenses several thousand raw points into a consistent set of approximately 100 to 150 representative points, preserving key geometric features while reducing computational load. Subsequently, a hybrid RANSAC-PCA algorithm operates on the refined point cloud to robustly estimate the strawberry's principal axis and 3D centroid. The final stage employs a semi-truncated cone model, defined by height, top diameter, and bottom diameter, which is fitted to the processed point cloud through an optimization process to complete the size estimation. Comprehensive experimental validation confirms the framework's strong performance. The enhanced segmentation model achieved a mAP@0.5 of 96.9% and a ripe fruit segmentation accuracy of 98.3%, while maintaining a high inference speed of 159 frames per second. The size estimation demonstrated high accuracy across varied conditions. Tests under different viewing angles yielded average relative errors of 1.2%, 0.7%, and 1.7% for height, top diameter, and bottom diameter, respectively. Evaluations across a practical range of capture heights from 30 cm to 70 cm showed consistent robustness, with corresponding errors of 1.8%, 1.2%, and 4.8%. Multi-sample testing on 22 standard strawberry specimens further validated the method's precision, revealing mean absolute errors of 0.13 mm, 0.03 mm, and 0.03 mm for the three dimensions, all within the sub-millimeter range. In a direct comparative analysis, the proposed method significantly outperformed baseline approaches. It reduced the absolute estimation error for the top and bottom diameters by 50% and 70%, respectively, compared to a single-view TSDF method. Crucially, it successfully provided an accurate estimate for the bottom diameter, a capability fundamentally absent in semi-elliptical models. The complete end-to-end pipeline requires an average processing time of 3.3s per frame. These results demonstrate the method's feasibility and strong potential for real-time field deployment in applications such as automated strawberry phenotyping, in-field grading, and precision yield prediction, offering a practical solution based on affordable RGB-D sensing hardware.