Abstract:
Rice is one of the most important staple crops worldwide. Its yield and quality can directly influence the global food security and the agricultural economy. Nevertheless, the rice pests have ranked the most among common biological disasters that threaten stable, high yield rice production. The International Rice Research Institute has reported that the rice pests and diseases have cut the yields by up to 37%, with the losses ranging from 24% to 41%. Real-time monitoring and counting of pests are often required for the prevention and control in green agriculture. It is very necessary for the accurate identification of pests at different scales and reliable tracking of their population dynamics Yet the pest community in paddy fields is extraordinarily diverse. Pest sizes are often ranged from the millimeter scale aphids and thrips to stem borers and leaf folder larvae exceeding ten millimeter. All pest can concurrently occur in the same plot, and then simultaneously inhabit leaves, leaf sheaths, stems, or panicles. Conventional manual scouting or simple image processing cannot fully meet the accurate detection and counting, due to the complex spatial distribution, extreme morphological variation, and heavy background clutter. In this study, the YOLO-MSLP (multi-scale lightweight pest), an intelligent lightweight model was proposed for the rice-pest detection and counting, in order to overcome these challenges. The latest YOLOv11n backbone, YOLO-MSLP was introduced three innovations that tailored to the complex scenes in the paddy field. Firstly, an adaptive pooling bidirectional feature pyramid network (AP-BiFPN) was embedded in the neck. The adaptive pooling was dynamically adjusted the receptive field and bidirectional cross scale fusion. Multiscale features were extracted and then aggregated in a stable manner, whether the targets were solitary pests or dense clusters. The small object detection was greatly improved for the accuracy of the large object localization. Secondly, a multi-scale triplet attention module (MS-TAM) was inserted between the backbone and detection heads. The channel, spatial, and scale dimensions were operated in parallel. The discriminative pest features were adaptively highlighted to suppress the redundant background information closely resembled the pests, such as the shape, texture, and color. Experimental results showed that the module was maintained on the high confidence outputs even under back lighting, leaf occlusion, or overlapping rice plants. Finally, the backbone was reengineered with a reparametrized vision transformer (RepViT), in order to lower deployment barriers. Furthermore, knowledge distillation was compressed to transfer the rich representations from a larger teacher network into the lightweight student. The YOLO-MSLP was achieved a mean average precision (mAP) of 94.5% and a recall of 91.7%, after pruning, quantization, and operator fusion. Floating point operations were reduced by 24.4%, and model size was shrunk by 40.7%. Inference latency for a single image on an edge GPU fell below 35 ms. Extensive testing confirmed that the YOLO MSLP can run in real time on embedded devices, thus providing for a low-cost, highly reliable tool for early warning, precise spraying, and green control of rice pests. The model can be expected for the large-scale smart-agriculture deployments to advance the sustainable rice industry. The finding can also provide the data referenec for the scientific interventions, thereby reducing the pesticide use and residue risk.