结合多模态检测头的小蠹类害虫细粒度识别模型

    Fine-grained recognition model for bark beetles with multimodal detection heads

    • 摘要: 为解决小蠹类害虫(Dendroctonus spp)物种多样性高、近缘种形态相似且常同域分布导致的种类鉴定困难问题。该研究提出了能够细粒度识别小蠹虫种类的FGRS-Net(fine-grained recognition for scolytidae network)模型。首先,为缓解样本不足导致的识别偏差,该研究自主设计了基于多模态嵌入的检测头模块;其次,为提取跨尺度鉴别特征,利用注意力机制混合模块ACmix(attention convolution mixer)实现了融合特征捕捉;为进一步获取特征并降低参数量,引入了全维度动态卷积模块ODConv(omni-dimensional dynamic convolution)重点关注昆虫细粒度特征;并通过剪枝以及知识蒸馏轻量化模型;为全面评估模型在实际应用中的可靠性,该研究在低照度、模糊及复杂背景遮挡等多种干扰条件下进行了系统的鲁棒性测试,并在不同计算架构的边缘设备上完成了部署验证。试验结果显示,FGRS-Net的平均精度均值达到89.3%,召回率为98%,浮点运算量降低23%,NVIDIA RTX 5090 GPU部署帧率达到289帧/s;双平台开发板部署帧率分别为11帧/s以及27帧/s。实践表明,FGRS-Net模型具有精确度高和轻量化的优点,相比于现有主流模型具有较好的竞争力,该研究结果可为后续细粒度小蠹虫识别提供参考。

       

      Abstract: Bark beetle (Dendroctonus spp) is one of the typical wood-boring pests. It has posed a serious threat to the forest resource, due to the small size, cryptic behavior, and long damage cycles. According to the monitoring data from the National Forestry and Grassland Administration, the infestations of these pests occurred in 2024 in southwestern, northern, and northwestern China. There is the infectious area of 184 000 hectares, with the moderate to severe damage accounting for 15.08% of the total. Yet, their control and prevention strategies can differ substantially, due to the different species of the bark beetle with the sympatric distribution. For instance, the Dendroctonus micansrequires removal of the infested trees is utilized with γ-hexachlorocyclohexane (Lindane) treatment; controlling Dendroctonus valensinvolves is often combined with adult eradication and aluminum phosphide; whereas Heterobostrychus hamatipennisrelies can depend on methyl bromide or aluminum phosphide fumigation. Moreover, the interspecies similarity in macroscopic features (such as the body length and coloration) can cause the identification to be highly dependent on the local microscopic features, including the shape of the disc and elytral punctures. The typical fine-grained recognition is often required to accurately distinguish different species within the highly similar base category (Scolytidae), according to the subtle discriminative features. Furthermore, manual identification of the morphologies is highly subjective; In this study, a rapid and accurate fine-grained recognition was developed to prevent and control the bark beetles. A FGRS-Net (Fine-Grained Recognition for Scolytidae Network) architecture was also constructed to identify the bark beetles. Multi-level technologies were proposed to systematically solve the key issues in bark beetle recognition, including the scale variation, feature confusion, and computational efficiency. Firstly, a detection head module with multi-modal embedding was proposed to mitigate the inter-class recognition bias caused by insufficient training samples. Morphological feature vectors, local texture descriptors, and spatial contextual information were integrated to significantly reduce the false detection rates induced by uneven sample distribution. A joint embedding space was then constructed to effectively enhance the discrimination for the morphologically similar species. Secondly, an Attention Convolution Mixer (ACmix) module was introduced for the large size range and variable habitat postures of the bark beetles. The multi-scale receptive fields were adaptively adjusted using the parallel convolutional paths and self-attention mechanisms. This module was realized to capture the local details of the millimeter-scale pests (such as elytral punctures and antenna morphology). While the overall distribution patterns were effectively identified in the aggregated populations. Thereby, the robustness of the feature discrimination was improved in complex backgrounds. An Omni-Dimensional Dynamic Convolution (ODConv) module was integrated to further optimize the feature representation efficiency. A four-dimensional attention mechanism was constructed (across spatial, channel, kernel, and network depth dimensions). The dynamic generation and adaptive calibration of the convolutional parameters significantly reduced the number of parameters. While the key discriminative features were enhanced, such as the wing venation structure and body segment proportions. In model lightweighting, a combined optimization was adopted on structured pruning and knowledge distillation. Channel importance was constrained via L1 regularization to prune the redundant feature connections. While a multi-teacher distillation framework was designed to transfer the hierarchical feature representations from large networks to a lightweight student model. As a result, the model size was compressed by 40.7%, and the inference latency was reduced by 35%, indicating the high accuracy. A multi-interference condition testing system was constructed to validate the applicability in practical scenarios. Complex field environments were simulated, including lens fog, low illumination, blur, and foliage occlusion. Deployment verification was conducted on the edge devices with different computational architectures. Experimental results show that the FGRS-Net achieved a mean Average Precision (mAP) of 89.3% and a recall rate of 98% on the self-built fine-grained bark beetle dataset, with a 23% reduction in the Floating Point Operations (FLOPs) and a detection speed of 289 FPS. In edge device deployment, the Raspberry Pi platform achieved real-time inference at 11 FPS, while the RK3576 platform reached a processing speed of 27 FPS. The technical solution can provide reliable technical support for accurate monitoring of bark beetles in field environments. The finding can offer important references for the pest recognition models in the field of smart forestry.

       

    /

    返回文章
    返回