果园环境下改进YOLOv8的无人机图像多目标苹果识别

    Multi-target apple recognition from UAV images under orchard environments using improved YOLOv8

    • 摘要: 针对苹果园无人机图像在实际应用中存在果实光照不均、重叠遮挡以及小目标多等问题,提出了复杂环境下无人机图像改进YOLOv8的多目标苹果识别方法。首先,采用通用高效层聚合网络(generalized efficient layer aggregation network,GELAN)模块替换YOLOv8主干网络中的C2f模块,减少了27%的模型参数量,并在空间金字塔池化后引入动态大卷积核空间注意机制(large separable kernel attention,LSKA),扩大识别模型的感受野,能更好地捕捉复杂背景下苹果目标的全局信息。然后,在Neck层使用轻量化上采样算子DySample以增强多尺度特征的融合能力,提高了对密集分布及重叠果实的区分与识别精度。最后,在Head层添加160×160的小目标检测头,增强对有遮挡多目标的识别能力。在自建数据集上的试验结果表明,改进识别模型的准确率为92.0%,召回率为91.2%,平均精度均值为96.6%,平均检测速度为129.1帧/s。与原YOLOv8相比,苹果识别准确率、召回率和平均精度均值分别提升了1.8、2.5和1.6个百分点。该方法对于解决复杂环境下无人机图像多目标果实的精准识别问题中提供重要的技术支撑。

       

      Abstract: Unmanned aerial vehicle (UAV) imagery has been widely used to detect apples in an orchard. However, the substantial technical obstacles have limited the apple recognition frameworks. It is often required to balance the performance and practical applicability under the natural setting. The highly dynamic features also originated from the outdoor horticultural environments. There are extreme variations in the illumination intensity over fruit surfaces, occlusions from dense foliage, interwoven branches, and overlapping fruits. The abundant presence of immature or distant apples can be recognized, as the minuscule visual elements after the high-altitude captures. Such influencing factors can greatly contribute to the considerable degradation in the detection consistency. It is the high demand for more resilient computational approaches in order to overcome these specific constraints. Reliable multi-target apple identification is also required under actual operating conditions. In this research, an advanced multi-object apple recognition was introduced to specifically engineer for the UAV-acquired imagery under environmental scenarios. An improved YOLOv8 architecture was also enhanced after its introduction. In the backbone network, the strategic substitution was executed to replace the standard C2f module with a Generalized Efficient layer aggregation network (GELAN). This structural modification was amplified to capture the feature details of the apples. A notable 27% reduction was achieved in the total parameter volume, thereby substantially elevating computational efficiency and deployment feasibility. Subsequent to spatial pyramid pooling, a large separable kernel Attention (LSKA) mechanism was incorporated to effectively broaden the perceptual scope of the model. The superior assimilation of global contextual information was facilitated amidst cluttered backgrounds. As a direct consequence, more discriminative feature representations were derived from partially obscured and mutually overlapping apple instances. Furthermore, the conventional upsampling operator was supplanted by the lightweight DySample alternative at the neck network tier. The multi-scale feature maps were integrated to differentiate and classify the densely aggregated and severely occluded fruits. Furthermore, an auxiliary 160×160 small object head was integrated into the head layer, specifically engineered to bolster the recognition performance for the occluded multi-object configurations and diminutive targets. A systematic evaluation of the framework was conducted on a custom-built dataset consisting exclusively of drone-captured orchard imagery. Experimental results reveal that there was a final detection precision of 92.0%, a recall rate of 91.2%, and a mean average precision (mAP) metric of 96.6%. The computational performance attained an average inference speed of 129.1 frames per second (FPS). Once subjected to direct comparative analysis against the baseline YOLOv8 model, there were significant enhancements of 1.8 percentage points in precision, 2.5 percentage points in recall, and 1.6 percentage points in mAP. The effectiveness and synergistic benefits were validated to derive from the architectural enhancements and optimization. The solution successfully alleviated the detrimental impacts from the irregular lighting distribution, target occlusion, and small target dimensions. Thereby, an essential technical infrastructure was established to achieve the precise identification of multiple fruit targets from the UAV imagery under environmental conditions.

       

    /

    返回文章
    返回