Lightweight multi-scale Chinese herbal medicine detection method based on GCMD-YOLO
-
Abstract
Aiming at the technical bottleneck that existing intelligent detection algorithms for traditional Chinese medicinal materials (TCMMs) struggle to balance high detection accuracy and real-time performance during edge-end deployment, as well as the problems of small-target missed detection and category confusion caused by the significant morphological differences among medicinal materials and the high similarity of their cross-sectional textures, this paper proposes a lightweight multi-scale detection method based onGCMD-YOLO. Taking YOLOv11s as the benchmark architecture, this method achieves the comprehensive improvement of model accuracy and efficiency through the innovative design of feature extraction mechanisms and attention mechanisms. Firstly, by adopting multi-source data fusion and data augmentation technologies, a high-quality dataset covering 100 common varieties of TCMMs with a total of 30,480 images is constructed, which effectively addresses the issues of unbalanced data distribution and single background existing in traditional datasets. Secondly, in terms of model lightweight design, the GhostConv module is utilized to reconstruct the backbone and neck networks, and the Dysample dynamic upsampling module is introduced. This not only drastically reduces parameter redundancy but also optimizes the efficiency of feature reconstruction. To tackle the problem that the traditional SPPF (Spatial Pyramid Pooling Fast) module tends to cause the loss of shallow texture information, the Conv-SPPF module is designed. By replacing pooling operations with convolution operations, this improved module effectively preserves the key edge and texture details of TCMM images. Furthermore, to overcome the challenge of fine-grained feature extraction, the C3K2-GL module is proposed, which incorporates a global-local dual-path dynamic convolution mechanism. By adaptively adjusting the weights of convolution kernels, this module significantly enhances the capability of capturing fine textures of rhizome-type medicinal materials and small targets of seed-type medicinal materials. Meanwhile, the MG-CA multi-scale global attention mechanism is designed and integrated into the feature fusion layer. Leveraging a multi-granularity pooling strategy, this mechanism accurately focuses on the key discriminative regions, suppresses the interference from complex backgrounds, and thus effectively improves the distinguishability of similar TCMM categories. The results show that the GCMD-YOLO model exhibits outstanding performance on the self-built dataset. Specifically, the precision, recall, and F1-score reach 93%, 91%, and 92% respectively; the mean average precision at the intersection over union (IoU) threshold of 0.5 (mAP@0.5) attains 92.8%, and the mAP@0.5:0.95 (mean average precision across IoU thresholds from 0.5 to 0.95 with an interval of 0.05) reaches 78.6%. These metrics represent improvements of 7.0, 8.0, 7.5, 7.3, and 7.4 percentage points over the baseline model, respectively. In terms of model efficiency, the number of parameters is reduced by 16% and the inference speed is increased by 27%. Cross-dataset generalization experiments show that without any fine-tuning, the mAP@0.5 of the proposed model on external datasets is still 7.6 percentage points higher than that of the baseline, demonstrating extremely strong robustness. Analyses of the confusion matrix and Class Activation Mapping (CAM) heatmaps further confirm that this method can effectively resolve the inter-category confusion problem and achieve accurate target localization. Finally, deployment tests on the Jetson Nano edge device indicate that the inference speed of the model reaches 39 frames per second (FPS), which successfully verifies its feasibility and engineering application value for achieving high-precision and real-time detection in resource-constrained scenarios.
-
-