Multi-target apple recognition from UAV images under orchard environments using improved YOLOv8
-
Graphical Abstract
-
Abstract
Unmanned aerial vehicle (UAV) imagery has been widely used to detect apples in an orchard. However, the substantial technical obstacles have limited the apple recognition frameworks. It is often required to balance the performance and practical applicability under the natural setting. The highly dynamic features also originated from the outdoor horticultural environments. There are extreme variations in the illumination intensity over fruit surfaces, occlusions from dense foliage, interwoven branches, and overlapping fruits. The abundant presence of immature or distant apples can be recognized, as the minuscule visual elements after the high-altitude captures. Such influencing factors can greatly contribute to the considerable degradation in the detection consistency. It is the high demand for more resilient computational approaches in order to overcome these specific constraints. Reliable multi-target apple identification is also required under actual operating conditions. In this research, an advanced multi-object apple recognition was introduced to specifically engineer for the UAV-acquired imagery under environmental scenarios. An improved YOLOv8 architecture was also enhanced after its introduction. In the backbone network, the strategic substitution was executed to replace the standard C2f module with a Generalized Efficient layer aggregation network (GELAN). This structural modification was amplified to capture the feature details of the apples. A notable 27% reduction was achieved in the total parameter volume, thereby substantially elevating computational efficiency and deployment feasibility. Subsequent to spatial pyramid pooling, a large separable kernel Attention (LSKA) mechanism was incorporated to effectively broaden the perceptual scope of the model. The superior assimilation of global contextual information was facilitated amidst cluttered backgrounds. As a direct consequence, more discriminative feature representations were derived from partially obscured and mutually overlapping apple instances. Furthermore, the conventional upsampling operator was supplanted by the lightweight DySample alternative at the neck network tier. The multi-scale feature maps were integrated to differentiate and classify the densely aggregated and severely occluded fruits. Furthermore, an auxiliary 160×160 small object head was integrated into the head layer, specifically engineered to bolster the recognition performance for the occluded multi-object configurations and diminutive targets. A systematic evaluation of the framework was conducted on a custom-built dataset consisting exclusively of drone-captured orchard imagery. Experimental results reveal that there was a final detection precision of 92.0%, a recall rate of 91.2%, and a mean average precision (mAP) metric of 96.6%. The computational performance attained an average inference speed of 129.1 frames per second (FPS). Once subjected to direct comparative analysis against the baseline YOLOv8 model, there were significant enhancements of 1.8 percentage points in precision, 2.5 percentage points in recall, and 1.6 percentage points in mAP. The effectiveness and synergistic benefits were validated to derive from the architectural enhancements and optimization. The solution successfully alleviated the detrimental impacts from the irregular lighting distribution, target occlusion, and small target dimensions. Thereby, an essential technical infrastructure was established to achieve the precise identification of multiple fruit targets from the UAV imagery under environmental conditions.
-
-