基于轻量型卷积神经网络的马铃薯种薯芽眼检测算法

黄杰; 王相友; 吴海涛; 刘书玮; 杨笑难; 刘为龙

doi:10.11975/j.issn.1002-6819.202303035

基于轻量型卷积神经网络的马铃薯种薯芽眼检测算法

Detecting potato seed bud eye using lightweight convolutional neural network (CNN)

摘要

摘要: 马铃薯种薯芽眼属于小目标物体，识别难度大、要求高。为了在试验台（芽眼识别装置）上快速、准确地完成识别任务，该研究提出一种基于轻量型卷积神经网络的芽眼检测模型。首先，为了降低模型的计算量和聚焦小目标物体，替换YOLOv4的主干网络CSPDarkNet-53为GhostNetV2轻量型特征提取网络；其次，在YOLOv4的颈部网络中，使用深度可分离卷积（depthwise separable convolution，DW）模块代替普通卷积块进一步降低模型计算量；最后，更改边界框损失函数为具有角度代价的边界框损失函数（SIoU），避免因预测框的位置不确定，而影响模型收敛速度和整体检测性能。结果表明，改进后芽眼检测模型参数量为12.04 M，使用笔记本电脑CPU检测单张图片的时间为0.148 s，从试验台收集的测试数据显示平均精度为89.13%。相对于其他主干特征提取网络CSPDarkNet-53、MobileNetV1、MobileNetV2、MobileNetV3、GhostNetV1，其检测精度分别高出1.85、0.75、2.67、4.17、1.89个百分点；与同类目标检测模型SSD、Faster-RCNN、EifficientDet、CenterNet、YOLOv7相比，在检测精度上，分别高出23.26、27.45、10.51、18.09、2.13个百分点，在检测时间上，分别降低0.007、6.754、1.891、1.745、0.422 s，且模型参数量具有明显优势。该研究为小目标物体检测和模型部署提供技术支撑。

Abstract: Manually pre-sowing of seed pieces cannot fully meet the large-scale potato production in China in recent years, due to the low-level mechanization, high labor costs, and intensity. Automated equipment can be expected to realize potato seed cutting. However, the potato seed eyes cannot be accurately positioned during processing using mechanized equipment, resulting in serious waste. Particularly, accurate and rapid target detection is highly required to identify the potato seed eyes, due to the small target objects. It is also necessary for the high recognition of small targets with fewer forward inference parameters. In this study, a target detection model was proposed to rapidly, accurately, and real-time recognize the potato seed eyes in the block-cutting equipment using a lightweight convolutional neural network (CNN). Firstly, a lightweight feature extraction network (GhostNetV2) was selected to replace the CSPDarkNet-53 in the backbone network of YOLOv4, in order to reduce the forward inference parameters of the model for the more focus on small target objects. Secondly, the depthwise separable convolution (DW) modules were used to further reduce the computational complexity in the neck network of YOLOv4. Finally, the bounding box loss function was changed to the SCYLLA-IoU (SIoU) loss function with the angle cost. The impact of the uncertain position was avoided in the prediction box on the convergence speed and the overall detection performance of the model. The experimental results indicated that the parameter size was 12.04 M, when the GhostNetV2 model was utilized as the backbone feature extraction network for the YOLOv4. The test dataset was also collected from the experimental platform. A better performance was achieved in the average precision of 89.13%, where the time required to detect a single image using a CPU on a laptop was 0.148 s. The F1 scores were 0.80 and 0.99 for the buds and potatoes, respectively. The improved backbone network presented approximately one-third of the original parameter size, with an increase in the detection accuracy of 1.85 percentage points, and a decrease in the detection time of 0.279 s, compared with the CSPDarkNet-53 backbone network before improvement. Furthermore, the GhostNetV2 backbone network improved the detection accuracy by 0.75, 2.67, 4.17, and 1.89 percentage points, compared with the lightweight backbone networks, including MobileNetV1, MobileNetV2, MobileNetV3, and GhostNetV1. The F1 values were also improved by 0.06, 0.07, 0.12, and 0.08 for the buds, respectively. The SIoU bounding box loss function showed detection accuracy improvements of 2.97, 4.33, 2.38, and 3.18 percentage points, compared with the GIoU, CIoU, DIoU, and EIoU ones, respectively. Moreover, the improved YOLOv4 object detection model shared the higher recognition accuracy, with increases of 23.26, 27.45, 10.51, 18.09, and 2.13 percentage points, respectively, compared with similar object detection models, such as SSD, Faster-RCNN, EfficientDet, CenterNet, and YOLOv7. In terms of the detection time, the improved YOLOv4 object detection model reduced the detection times by 0.007, 6.754, 1.891, 1.745, 0.422, and 0.326 s, compared with the SSD, Faster-RCNN, EfficientDet, CenterNet, YOLOv7, and YOLOv4, respectively. In model parameter size, the improved detection model was only 12.04M parameters. Overall, the finding can also provide new technical support for the recognition and model deployment of small target objects, such as the potato buds.

HTML全文

参考文献(36)

施引文献

资源附件(0)