Abstract
Accurate identification of rice pests is crucial for food security; however, constructing a robust pest recognition model in complex field scenarios faces core challenges: extreme imbalance in sample distribution, and significant limitations of existing models in cross-channel feature extraction and adaptation to imbalanced data. To address these challenges, this study proposes a dual-feature dynamic optimization model—ResNet-EAF—for imbalanced rice pest recognition. This framework integrates two feature calibration mechanisms, namely Efficient Channel Attention (ECA) and Channel-wise Affine Adaptation (CAA), to achieve synergistic and precise optimization of feature representation.To realize accurate feature representation, the model constructs a dual-feature optimization mechanism of "ECA feature screening—CAA feature calibration" after the Global Average Pooling (GAP) layer of the ResNet50 network. The ECA module establishes inter-channel correlations within local windows via adaptively matched 1D convolution kernels, capturing cross-channel interaction information without dimensionality reduction. This module can adaptively amplify the weights of key pest feature channels while suppressing interference from redundant channels, thereby enhancing the saliency of core features. Following the ECA module, the CAA module introduces two types of learnable parameters, allowing each feature channel to independently learn configurations to achieve refined regulation of contribution weights. During optimization, the model prioritizes channels decisive for classification while weakening noise channels. Its core advantage lies in decoupling the weight-bias coupling relationship in traditional affine transformation into channel-wise independent scaling-translation operations. This design improves the feature distinguishability between different pest categories and adapts to differences in feature distribution across datasets, thereby alleviating the domain shift problem and enhancing the generalization ability on unknown field data. Notably, the CAA module introduces very few learnable parameters with negligible computational overhead, ensuring recognition efficiency.To tackle the problem of extreme sample imbalance, this study designs a dynamic balanced loss strategy integrating Focal Loss (FL), inverse frequency weighting, and modulation coefficients, which acts synergistically with the dual-feature module. Specifically, inverse frequency weighting dynamically assigns weights based on the proportion of class samples to initially balance category distribution; FL reduces the weights of easily classified majority-class samples via a modulation factor, focusing learning on hard-to-classify minority-class samples; additional modulation coefficients fine-tune the loss gradient to mitigate training bias caused by extreme imbalance. This strategy is highly compatible with the stable deep feature extraction capability of ResNet50, ensuring accurate identification of dominant pest categories while excavating subtle core features of minority-class samples, thus significantly improving the coverage of full-category recognition.Comparative experiments on the self-built pest image dataset show that the ResNet-EAF model achieves an accuracy of 98.06%, a macro-average recall of 93.93%, and an F1-score of 93.80%, which are 3.63, 6.96, and 5.67 percentage points higher than the baseline model, respectively, ranking first among 11 competing models. For minority-class pests (Pyralidae and Arctiidae), the recall rates are increased by 13.16% and 4.55%, respectively.To validate the anti-interference ability and generalization ability of the model in real field environments, this study conducts a generalization evaluation on the public JUTE PEST dataset. The results show that the ResNet-EAF model achieves an accuracy of 98.35% (second only to DINOv2’s 98.42%), with its macro-average recall and the recall rate of the minority-class Beet Armyworm both ranking first among 11 competing models.Ablation experiments verify the effectiveness of the dual-feature module: introducing the CAA module alone yields limited improvement because it lacks a channel importance pre-screening mechanism to locate key pest features, only performs generalized feature calibration, and cannot directly solve the dilemma of minority-class recognition. In contrast, the synergy of ECA and CAA forms a complete "feature screening—calibration" closed loop: ECA first precisely screens key channels to provide a high-quality "effective feature base" for CAA, and then CAA performs channel-wise adaptive calibration on these features to enhance feature distinguishability. In addition, comparative experiments with 4 mainstream attention mechanisms and 5 common loss functions confirm that the proposed combination of ECA and dynamic balanced loss achieves the highest accuracy, macro-average recall, and F1-score, verifying the rationality and superiority of the technical design.In summary, ResNet-EAF provides an efficient technical solution for agricultural pest monitoring in imbalanced data scenarios through ECA-CAA dual-feature dynamic optimization and a synergistic dynamic loss strategy. Extensive experiments verify the practicality and robustness of the model in complex field environments, highlighting the significant performance gains brought by the synergy of FL, ECA, and CAA, and offering an extensible solution for reliable field pest recognition in precision agriculture.