基于改进YOLOv10n的自然环境下莲蓬成熟度检测方法

    Ripeness detection of lotus seedpod in natural environment based on improved YOLOv10n

    • 摘要: 为解决在不同光照条件和目标遮挡等自然环境下,莲蓬智能采摘缺乏成熟度自动检测技术的问题,该研究提出了一种基于改进 YOLOv10n 的自然环境下莲蓬成熟度检测方法LotusM-YOLO。首先,在YOLOv10n模型的基础上引入DynamicConv(dynamic convolution)动态卷积模块,通过动态组合多个卷积核有效增强模型在不同光照条件下的特征提取能力,从而提升模型在强光、弱光等复杂环境下对莲蓬目标的检测稳定性与精度;然后,进一步引入MultiSEAM(multi-scale efficient attention module)和CBAM(convolutional block attention module) 两种注意力机制,增强了模型对被遮挡的莲蓬目标的检测能力和小目标特征的关注能力,提升模型对莲蓬成熟度检测的准确率。结果表明,LotusM-YOLO模型的检测平均精度均值为86.7%,相比基线模型YOLOv10n提高了3.9个百分点,与Faster R-CNN、YOLOv5n、YOLOv8n、YOLOv9主流检测模型相比分别提高5.2、4.3、4.0和4.4个百分点。该研究提出的LotusM-YOLO模型对自然复杂环境下的莲蓬可实现较为高效、准确的成熟度检测,能够为莲蓬的生长状态监测和智能采摘装备的研发提供技术支持。

       

      Abstract: Lotus seedpod is one of the most important components of the lotus flowers. It is often required for the accurate and efficient detection of lotus seedpod maturity under natural environments in intelligent harvesting and precision agriculture. However, there are variable lighting conditions, as well as frequent occlusion by stems and leaves, due mainly to the small size of the lotus seedpods under complex backgrounds. Conventional object detection models can be limited in maintaining high accuracy in such environments, especially under strong light, weak light, or severe occlusion. In this study, the LotusM-YOLO model was proposed to enhance the YOLOv10n architecture after a series of targeted improvements. The great contributions included three key enhancements. Firstly, the dynamic convolution (DynamicConv) module was integrated into the backbone of the YOLOv10n model to enhance its adaptability under varying lighting conditions. The multiple convolutional kernels were dynamically combined to effectively extract the robust features from the images under the strong or low light environments. The irrelevant background noise was suppressed to preserve the essential features of the lotus seedpods, thereby significantly enhancing the detection accuracy and stability in natural paddy field scenes with complex illumination. Secondly, the Multi-scale efficient attention module (MultiSEAM) was improved to detect the small and partially occluded lotus seedpods, where the contextual information was captured over multiple feature scales. At the same time, some interference was further suppressed from the complex backgrounds in order to enhance the robustness under visually cluttered environments. Finally, the convolutional block attention module (CBAM) was sequentially applied as the channel and spatial attention to refine the feature representation. The detection precision was effectively enhanced for the lotus seedpods. The rate of the missed detections was significantly reduced using the attention mechanism. Together, these attention modules synergistically strengthened the sensitivity to the occluded and small targets under natural environments, in order to maintain the high detection accuracy. The performance of the LotusM-YOLO model was evaluated after optimization. A high-quality dataset contained 2 411 manually annotated images of lotus seedpods under natural conditions. The dataset was randomly divided into the training, validation, and test sets at a 7:2:1 ratio. The experimental results show that the LotusM-YOLO achieved a precision of 84.3%, a recall of 81.7%, and a mean average precision at IoU 0.5 (mAP0.5) of 86.7%, indicating an increase of 2.7 percentage points, 2.5 percentage points, and 3.9 percentage points, respectively, over the YOLOv10n baseline. Subsequently, the comparative experiments were conducted using multiple detection models, including Faster R-CNN, YOLOv5n, YOLOv8n, YOLOv9, and YOLOv10n. The results demonstrated that the LotusM-YOLO model achieved higher detection precision and recall under strong light, low light, and partial occlusion conditions, in order to significantly reduce the missed detections. The LotusM-YOLO model also exhibited stronger robustness in the lotus seedpod detection tasks under natural environmental conditions. Additionally, the heatmaps were generated using Gradient-weighted Class Activation Mapping (Grad-CAM). The improved model was more focused on the actual target areas. The attention was reduced to the background clutter, especially compared with the YOLOv10n model. Beyond technical performance, the LotusM-YOLO model can offer strong potential for real-world applications. The detection can be integrated with the depth data from RGB-D or LiDAR sensors. The accurate 3D localization of seedpods can guide the robotic arms in the picking tasks of the lotus seedpod. Consequently, the LotusM-YOLO model can provide a theoretical basis to monitor the growth status of the lotus seedpods for the intelligent harvesting equipment under a natural environment.

       

    /

    返回文章
    返回