Abstract:
Tea-picking robots can often be required to identify the tea buds in complex environments. However, the existing tea picking robots have been limited to low accuracy and a large amount of calculation. In this study, a tea bud detection was proposed using improved YOLOv11n. A systematic optimization was made on the backbone network, detection head, down-sampling module, and loss function of YOLOv11 in an all-around way. The accuracy of the tea bud recognition was enhanced to reduce the complexity of the network model in a complex environment. Specifically, four dimensions were utilized to improve the algorithm. Firstly, the deep convolution and global average pooling were combined to design a C3K2-PFCGLU (C3K2-PoolingFormer and Convolutional Gated Linear Unit) structure for the detection speed of the model. At the same time, a Detail-reinforced and Lightweight Shared Convolution Detection Head DRLSDH (Detail-reinforced and Lightweight Shared Convolution Detection Head) was designed to effectively compress the number of model parameters, in order to improve the accuracy of the detection. Secondly, the lightweight feature extraction module (Adown) was used to replace the traditional convolutional down-sampling layer in the backbone network. The calculation amount and model size of the DRLSDH module were reduced to adjust the group convolution and step size. Finally, the DIoU loss function was used to improve the accuracy of the model, and then optimize the bounding box effect. The generalization was also obtained for the slow convergence of the CIoU loss function in the detection task. The test was then conducted on the self-built dataset of tea buds, in order to verify the effectiveness of the improved algorithm. The results show that the improved algorithm greatly reduced the network complexity and parameters for high detection accuracy. The average accuracy mAP @ 0.5 reached 92.92 %, and the accuracy rate increased to 95.43 %, which was 0.14 % and 0.93 % higher than the baseline model (YOLOv11n), respectively. Although the recall rate decreased slightly to 87.37 %, the lightweight index of the model was significantly optimized: The number of parameters was reduced to 1.39 M, the amount of calculation was 4.2 G, and the weight of the model was only 3.4 MB, which was reduced by 45.74 %, 33.33 % and 35.85 %, respectively. In terms of performance, the embedded device detection frame rate was 23 frames per second. The high recognition accuracy and strong robustness fully met the deployment requirements of tea picking robots. On the whole, there were still some limitations in the improved model. The false detection was observed from the images with overlapping and high similarity in complex dark environments. Image enhancement improved the quality of images, thereby reducing the problem of false detection. In addition, more images (especially images in complex dark environments) can be collected from the experimental samples for the dataset production and model training. The finding can provide the technical support for the accurate and rapid detection of tea buds. More efficient deployment can be expected to further improve the robustness and lightweight level after model compression.