基于改进YOLOv8s-seg和半截锥模型的草莓尺寸估计方法

    Strawberry size estimation based on improved YOLOv8s-seg and semi-truncated cone models

    • 摘要: 针对垄作类锥型草莓果实的在线三维尺寸估计,该研究提出一种基于单视角RGB-D的“改进YOLOv8s-seg+半截锥模型”轻量级框架。YOLOv8s-seg主干嵌入C2f_Faster_EMA模块,在参数量仅增加0.3M、模型大小提升不足3%的代价下,将mAP@0.5由94.1%提升至96.9%,成熟果分割精度达到98.3%,推理速度仍保持159帧/s。根据获取的点云数据,经点云匹配-滤波-下采样-RANSAC-PCA级联处理,完成主轴线及质心估计。进而以纵径、顶径、底径构建半截锥模型,通过优化实现点云-模型亚毫米匹配。试验表明,经统计滤波与体素下采样预处理,点云数量从数千被规整化为百余量级的同时,其主体几何特征得以完整保留。不同视角下纵径、顶径、底径估计与实测差异分别为1.2%、0.7%、1.7%;不同高度下差异1.8%、1.2%、4.8%;多样本试验总体平均估计误差为在亚毫米级;在系统对比中优于单帧TSDF与半椭圆模型,且弥补了半椭圆模型对于底径估计不足的问题,实现了对草莓三维尺寸更完整、更精确的表征。全流程平均耗时3.3s/帧,满足田间实时尺寸估计需求。

       

      Abstract: This paper addresses the challenge of achieving accurate and real-time three-dimensional size estimation for strawberries in agricultural field environments. A significant practical limitation stems from operational constraints in ridge-planting systems, where automated vision systems are typically restricted to data capture from a single, oblique viewpoint. This results in the generation of only a partial three-dimensional point cloud, representing merely the visible upper surface of the fruit and inherently lacking complete geometric information. Concurrently, a fundamental modeling challenge exists because the natural morphology of a strawberry, characterized by a conical shape with a truncated calyx base and a tapered tip, deviates substantially from the simple symmetric shapes such as ellipsoids commonly used in conventional fitting algorithms. This geometric mismatch is a primary source of estimation inaccuracies, particularly for critical dimensions like the bottom diameter, which cannot be directly observed from the available data perspective. To overcome these interconnected challenges of data incompleteness and model inadequacy, this study proposes a novel, integrated computational framework that synergistically combines a lightweight, enhanced segmentation network with a purpose-specific geometric model designed to match the fruit's true morphology. The proposed methodology follows a streamlined, four-stage pipeline. The process begins with robust instance segmentation performed by an improved YOLOv8s-seg network. Its backbone is augmented with a custom C2f_Faster_EMA module to enhance multi-scale feature extraction for fine details with minimal computational overhead. The resulting high-precision mask is fused with synchronized depth data to generate an initial 3D point cloud. This cloud then undergoes critical preprocessing involving statistical outlier removal and voxel grid downsampling. This step efficiently condenses several thousand raw points into a consistent set of approximately 100 to 150 representative points, preserving key geometric features while reducing computational load. Subsequently, a hybrid RANSAC-PCA algorithm operates on the refined point cloud to robustly estimate the strawberry's principal axis and 3D centroid. The final stage employs a semi-truncated cone model, defined by height, top diameter, and bottom diameter, which is fitted to the processed point cloud through an optimization process to complete the size estimation. Comprehensive experimental validation confirms the framework's strong performance. The enhanced segmentation model achieved a mAP@0.5 of 96.9% and a ripe fruit segmentation accuracy of 98.3%, while maintaining a high inference speed of 159 frames per second. The size estimation demonstrated high accuracy across varied conditions. Tests under different viewing angles yielded average relative errors of 1.2%, 0.7%, and 1.7% for height, top diameter, and bottom diameter, respectively. Evaluations across a practical range of capture heights from 30 cm to 70 cm showed consistent robustness, with corresponding errors of 1.8%, 1.2%, and 4.8%. Multi-sample testing on 22 standard strawberry specimens further validated the method's precision, revealing mean absolute errors of 0.13 mm, 0.03 mm, and 0.03 mm for the three dimensions, all within the sub-millimeter range. In a direct comparative analysis, the proposed method significantly outperformed baseline approaches. It reduced the absolute estimation error for the top and bottom diameters by 50% and 70%, respectively, compared to a single-view TSDF method. Crucially, it successfully provided an accurate estimate for the bottom diameter, a capability fundamentally absent in semi-elliptical models. The complete end-to-end pipeline requires an average processing time of 3.3s per frame. These results demonstrate the method's feasibility and strong potential for real-time field deployment in applications such as automated strawberry phenotyping, in-field grading, and precision yield prediction, offering a practical solution based on affordable RGB-D sensing hardware.

       

    /

    返回文章
    返回