Fine-grained recognition model for bark beetles with multimodal detection heads
-
Graphical Abstract
-
Abstract
Bark beetle (Dendroctonus spp) is one of the typical wood-boring pests. It has posed a serious threat to the forest resource, due to the small size, cryptic behavior, and long damage cycles. According to the monitoring data from the National Forestry and Grassland Administration, the infestations of these pests occurred in 2024 in southwestern, northern, and northwestern China. There is the infectious area of 184 000 hectares, with the moderate to severe damage accounting for 15.08% of the total. Yet, their control and prevention strategies can differ substantially, due to the different species of the bark beetle with the sympatric distribution. For instance, the Dendroctonus micansrequires removal of the infested trees is utilized with γ-hexachlorocyclohexane (Lindane) treatment; controlling Dendroctonus valensinvolves is often combined with adult eradication and aluminum phosphide; whereas Heterobostrychus hamatipennisrelies can depend on methyl bromide or aluminum phosphide fumigation. Moreover, the interspecies similarity in macroscopic features (such as the body length and coloration) can cause the identification to be highly dependent on the local microscopic features, including the shape of the disc and elytral punctures. The typical fine-grained recognition is often required to accurately distinguish different species within the highly similar base category (Scolytidae), according to the subtle discriminative features. Furthermore, manual identification of the morphologies is highly subjective; In this study, a rapid and accurate fine-grained recognition was developed to prevent and control the bark beetles. A FGRS-Net (Fine-Grained Recognition for Scolytidae Network) architecture was also constructed to identify the bark beetles. Multi-level technologies were proposed to systematically solve the key issues in bark beetle recognition, including the scale variation, feature confusion, and computational efficiency. Firstly, a detection head module with multi-modal embedding was proposed to mitigate the inter-class recognition bias caused by insufficient training samples. Morphological feature vectors, local texture descriptors, and spatial contextual information were integrated to significantly reduce the false detection rates induced by uneven sample distribution. A joint embedding space was then constructed to effectively enhance the discrimination for the morphologically similar species. Secondly, an Attention Convolution Mixer (ACmix) module was introduced for the large size range and variable habitat postures of the bark beetles. The multi-scale receptive fields were adaptively adjusted using the parallel convolutional paths and self-attention mechanisms. This module was realized to capture the local details of the millimeter-scale pests (such as elytral punctures and antenna morphology). While the overall distribution patterns were effectively identified in the aggregated populations. Thereby, the robustness of the feature discrimination was improved in complex backgrounds. An Omni-Dimensional Dynamic Convolution (ODConv) module was integrated to further optimize the feature representation efficiency. A four-dimensional attention mechanism was constructed (across spatial, channel, kernel, and network depth dimensions). The dynamic generation and adaptive calibration of the convolutional parameters significantly reduced the number of parameters. While the key discriminative features were enhanced, such as the wing venation structure and body segment proportions. In model lightweighting, a combined optimization was adopted on structured pruning and knowledge distillation. Channel importance was constrained via L1 regularization to prune the redundant feature connections. While a multi-teacher distillation framework was designed to transfer the hierarchical feature representations from large networks to a lightweight student model. As a result, the model size was compressed by 40.7%, and the inference latency was reduced by 35%, indicating the high accuracy. A multi-interference condition testing system was constructed to validate the applicability in practical scenarios. Complex field environments were simulated, including lens fog, low illumination, blur, and foliage occlusion. Deployment verification was conducted on the edge devices with different computational architectures. Experimental results show that the FGRS-Net achieved a mean Average Precision (mAP) of 89.3% and a recall rate of 98% on the self-built fine-grained bark beetle dataset, with a 23% reduction in the Floating Point Operations (FLOPs) and a detection speed of 289 FPS. In edge device deployment, the Raspberry Pi platform achieved real-time inference at 11 FPS, while the RK3576 platform reached a processing speed of 27 FPS. The technical solution can provide reliable technical support for accurate monitoring of bark beetles in field environments. The finding can offer important references for the pest recognition models in the field of smart forestry.
-
-