基于改进YOLOv11n的轻量化茶叶嫩芽检测

    Lightweight tea bud detection based on improved YOLOv11n

    • 摘要: 针对现有采茶机器人在复杂环境中识别茶叶嫩芽时存在着精度低和计算量大等问题,该研究提出了一种基于改进YOLOv11n的茶叶嫩芽检测算法。算法融合深度卷积和全局平均池化思想,设计C3K2-PFCGLU(C3K2-poolingformer and convolutional gated linear unit)结构,提高了模型的检测速度;同时设计轻量化检测头DRLSDH(detail-reinforced and lightweight shared convolution detection head),在提升检测精度的同时有效压缩模型参数量;其次采用轻量级特征提取模块ADown替换主干网络中的传统卷积模块,减少了DRLSDH模块的计算量和模型大小;最后为了弥补CIoU损失函数在检测任务中泛化性弱和收敛缓慢的问题,采用DIoU损失函数提升模型的准确性并优化边界框效果。结果表明,改进后的模型平均精度达到92.92%,准确率为95.43%,较基线模型YOLOv11n分别提高了0.14%和0.93%;模型大小为3.4 MB,参数量为1.39 M,计算量为4.2 G,分别降低了35.85%、45.74%和33.33%。性能方面,嵌入式设备检测帧率为23帧/s。该模型具有较高的识别准确率及较强的鲁棒性,研究结果为嵌入式设备和移动端的部署提供参考。

       

      Abstract: Tea-picking robots can often be required to identify the tea buds in complex environments. However, the existing tea picking robots have been limited to low accuracy and a large amount of calculation. In this study, a tea bud detection was proposed using improved YOLOv11n. A systematic optimization was made on the backbone network, detection head, down-sampling module, and loss function of YOLOv11 in an all-around way. The accuracy of the tea bud recognition was enhanced to reduce the complexity of the network model in a complex environment. Specifically, four dimensions were utilized to improve the algorithm. Firstly, the deep convolution and global average pooling were combined to design a C3K2-PFCGLU (C3K2-PoolingFormer and Convolutional Gated Linear Unit) structure for the detection speed of the model. At the same time, a Detail-reinforced and Lightweight Shared Convolution Detection Head DRLSDH (Detail-reinforced and Lightweight Shared Convolution Detection Head) was designed to effectively compress the number of model parameters, in order to improve the accuracy of the detection. Secondly, the lightweight feature extraction module (Adown) was used to replace the traditional convolutional down-sampling layer in the backbone network. The calculation amount and model size of the DRLSDH module were reduced to adjust the group convolution and step size. Finally, the DIoU loss function was used to improve the accuracy of the model, and then optimize the bounding box effect. The generalization was also obtained for the slow convergence of the CIoU loss function in the detection task. The test was then conducted on the self-built dataset of tea buds, in order to verify the effectiveness of the improved algorithm. The results show that the improved algorithm greatly reduced the network complexity and parameters for high detection accuracy. The average accuracy mAP @ 0.5 reached 92.92 %, and the accuracy rate increased to 95.43 %, which was 0.14 % and 0.93 % higher than the baseline model (YOLOv11n), respectively. Although the recall rate decreased slightly to 87.37 %, the lightweight index of the model was significantly optimized: The number of parameters was reduced to 1.39 M, the amount of calculation was 4.2 G, and the weight of the model was only 3.4 MB, which was reduced by 45.74 %, 33.33 % and 35.85 %, respectively. In terms of performance, the embedded device detection frame rate was 23 frames per second. The high recognition accuracy and strong robustness fully met the deployment requirements of tea picking robots. On the whole, there were still some limitations in the improved model. The false detection was observed from the images with overlapping and high similarity in complex dark environments. Image enhancement improved the quality of images, thereby reducing the problem of false detection. In addition, more images (especially images in complex dark environments) can be collected from the experimental samples for the dataset production and model training. The finding can provide the technical support for the accurate and rapid detection of tea buds. More efficient deployment can be expected to further improve the robustness and lightweight level after model compression.

       

    /

    返回文章
    返回