基于改进RetinaNet的果园复杂环境下苹果检测

孙俊; 钱磊; 朱伟栋; 周鑫; 戴春霞; 武小红

doi:10.11975/j.issn.1002-6819.2022.15.034

摘要: 为了快速准确地检测重叠、遮挡等果园复杂环境下的苹果果实目标，该研究提出一种基于改进RetinaNet的苹果检测网络。首先，该网络在传统RetinaNet的骨干网络ResNet50中嵌入Res2Net模块，提高网络对苹果基础特征的提取能力；其次，采用加权双向特征金字塔网络（Bi-directional Feature Pyramid Network，BiFPN）对不同尺度的特征进行加权融合，提升对小目标和遮挡目标的召回率；最后，采用基于焦损失（Focal Loss）和高效交并比损失（Efficient Intersection over Union Loss，EIoU Loss）的联合损失函数对网络进行优化，提高网络的检测准确率。试验结果表明，改进的网络在测试集上对叶片遮挡、枝干/电线遮挡、果实遮挡和无遮挡的苹果检测精度分别为94.02%、86.74%、89.42%和94.84%，平均精度均值（mean Average Precision，mAP）达到91.26%，较传统RetinaNet提升了5.02个百分点，检测一张苹果图像耗时42.72 ms。与Faster-RCNN和YOLOv4等主流目标检测网络相比，改进网络具有优异的检测精度，同时可以满足实时性的要求，为采摘机器人的采摘策略提供了参考。

Abstract: A fast and accurate detection is one of the most important prerequisites for the apple harvest robots. However, there are many factors that can make apple detection difficult in a real orchard scene, such as complex backgrounds, fruit overlap, and leaf/branch occlusion. In this study, a fast and stable network was proposed for apple detection using an improved RetinaNet. A picking strategy was also developed for the harvest robot. Specifically, once the apples occluded by branches/wires were regarded as the picking targets, the robot arm would be injured at the same time. Therefore, the apples were labeled with multiple classes, according to different types of occlusions. The Res2Net module was also embedded in the ResNet50, in order to improve the ability of the backbone network to extract the multi-scale features. Furthermore, the BiFPN instead of FPN was used as a feature fusion network in the neck of the network. A weight fusion of feature maps was also made at different scales for the apples with different sizes, thus improving the detection accuracy of the network. After that, a loss function was combined with the Focal loss and Efficient Intersection over Union (EIoU) loss. Among them, Focal loss was used for the classification loss function, further reducing the errors from the imbalance of positive and negative sample ratios. By contrast, the EIoU loss was used for the regression loss function of the bounding box, in order to maintain a fast and accurate regression. Particularly, there were some different relative positions in the prediction and the ground truth box, such as overlap, disjoint and inclusion. Finally, the classification and regression were carried out on the feature map of five scales to realize a better detection of apple. In addition, the original dataset consisted of 800 apple images with complex backgrounds of dense orchards. A data enhancement was conducted to promote the generalization ability of the model. The dataset was then expanded to 4 800 images after operations, such as rotating, adjusting brightness, and adding noise. There was also a balance between the detection accuracy and speed. A series of experimental statistics were obtained on the number of BiFPN stacks in the network. Specifically, the BiFPN was stacked five times in the improved RetinaNet. The ablation experiments showed that each improvement of the model enhanced the accuracy of the network for the apple detection, compared with the original. The average precision of the improved RetinaNet reached 94.02%, 86.74%, 89.42%, and 94.84% for the leaf occlusion, branch/wire occlusion, fruit occlusion, and no occlusion apples, respectively. The mean Average Precision (mAP) reached 91.26%, which was 5.02 percentage points higher than that of the traditional RetinaNet. The improved RetinaNet took only 42.72 ms to process an apple image on average. Correspondingly, each fruit picking cycle was 2 780 ms, indicating that the detection speed fully met the harsh requirement of the picking robot. Only when the apples were large or rarely occluded, both improved and traditional RetinaNet were used to accurately detect them. By contrast, the improved RetinaNet performed the best to detect all apple fruits, when the apples were under a complex environment in an orchard, such as the leaf-, fruit-, or branch/wire-occluded background. The reason was that the traditional RetinaNet often appeared to miss the detection in this case. Consequently, the best comprehensive performance was achieved to verify the effectiveness of the improvements, compared with the state-of-the-art detection network, such as the Faster RCNN and YOLOv4. Overall, all the apples in the different classes can be effectively detected for the apple harvest. The finding can greatly contribute to the picking strategy of the robot, further avoiding the potential damage by the branches and wires during harvesting.

基于改进RetinaNet的果园复杂环境下苹果检测

Apple detection in complex orchard environment based on improved RetinaNet