基于改进YOLO11n的复杂环境下多目标奶牛日常行为检测

张欣冉; 王新忠; 周奎州; 蒙贺伟; 彭慧杰; 李亚萍

doi:10.11975/j.issn.1002-6819.202501149

基于改进YOLO11n的复杂环境下多目标奶牛日常行为检测

Daily behaviour detection of multi-target dairy cows based on improved YOLO11n in complex environment

摘要

摘要: 为解决养殖场圈舍内奶牛与环境背景颜色相似、奶牛受卧床和栏杆等物体遮挡以及同一个监控画面下奶牛多尺度等因素影响，易造成现有模型在提取奶牛关键行为信息时出现误检、漏检等问题，该研究提出了一种基于改进YOLO11n的复杂环境下多目标奶牛日常行为检测方法。首先，在复杂环境背景下采集奶牛站立、行走、躺卧和采食4种基本行为图像，构建多目标奶牛行为数据集。其次，提出改进的WCG-YOLO11n模型，利用小波卷积层（WTConv）模块重新设计C3k2模块的瓶颈（botteleneck）结构，扩大图像感受野，增强对复杂背景上下文信息的特征提取能力。将级联分组注意力机制（cascaded group attention，CGA）融合至C2PSA模块，加强模型对遮挡奶牛区域特征的提取。在特征融合阶段利用重参数化泛化特征金字塔网络（efficient reparameterized generalized-FPN，Efficient RepGFPN）作为颈部网络，使其能够有效捕捉到奶牛行为图像中不同尺度的特征。最后，对改进模型WCG-YOLO11n在复杂环境下进行对比试验，结果表明，WCG-YOLO11n对多目标奶牛日常行为检测平均精度均值为95.3%，相比基线模型YOLO11n提高了2.2个百分点，与Faster R-CNN、DETR、YOLOv5s、YOLOv7、YOLOv8n和YOLOv9模型相比分别提高2.7、2.1、1.3、2.9、2.4和0.6个百分点。该模型在检测精度方面表现突出，能够有效应对不同程度的遮挡干扰，可为规模化养殖场饲养员监测奶牛行为提供一定的技术支持。

Abstract: Deep learning has drawn much attention in recent years, due to its powerful feature extraction and excellent prediction. Particularly, it can also be expected for the target detection in the animal husbandry industry. The existing research can focus mainly on the cow behaviour in the outdoor farm environments. However, only relatively limited research can be found under the indoor complex scenarios. Additionally, some challenging factors have been presented by the indoor environment of farms, such as the high background similarity and severe occlusions. It is the high demand for the robustness and generalisation of the target detection models. In this study, a multi-target detection method was proposed for the daily behaviour of dairy cows in a complex environment using improved YOLO11n. Firstly, the images of the cow behaviour were collected in the real indoor environments of the farm. Four basic behaviours were conducted, including standing, walking, lying, and eating food. A multi-target cow behaviour dataset was then constructed to finely annotate the images using LabelImg software. The various scales, angles, and behavioural postures were covered for the model training. Furthermore, the bottleneck structure was optimised to design the model architecture using wavelet convolution. The bottleneck structure of the C3k2 module was reconstructed to introduce a wavelet transform domain for the feature extraction. The receptive field was effectively expanded to represent the complex background. The contextual information was significantly enhanced after reconstruction. The attention mechanism of the cascade group was integrated into the C2PSA module. A spatial-channel strategy was employed to improve the feature extraction in the occluded areas. In the feature fusion stage, the Efficient RepGFPN was utilised as the neck network, in order to effectively capture the features from the cow behaviour images at different scales. Comparative experiments were conducted to verify the performance of the improved mode in complex environments. The results show that the mean average precision of the WCG-YOLO11n was 95.3% on the daily behaviour detection of the dairy cows, which was improved by 2.2 percentage points compared with the baseline model. The mAP_0.5 increased by 2.7, 2.1, 1.3, 2.9, 2.4, and 0.6 percentage points, respectively, compared with the high floating point computation and parameter models, such as the Faster R-CNN, DETR, YOLOv5s, YOLOv7, YOLOv8n, and YOLOv9. The floating point operations, the model parameters, and the model sizes were 9.5 G, 3.9 M, and 8.3 MB, respectively, for the WCG-YOLO11n. There was a slight increase of 3.2 G, 1.3 M, and 2.8 MB, respectively, compared with the baseline model YOLO11n. However, these values were still significantly smaller than those of the high-precision models, such as the DETR, YOLOv5s, and YOLOv9. The results show that the model has a high average accuracy and processing speed while consuming fewer computing resources, and is suitable for mobile devices. Furthermore, the 23 cow behaviours were detected in the multi-behaviour, multi-scale, and dense scene scenarios. The 4 missed and 0 false detections demonstrated that the superior performance was achieved, compared with the YOLOv5s, YOLOv8n, and YOLOv11n. The excellent performance and stability of the detection were the same as the high computation and parameter models, like the Faster R-CNN, DETR, YOLOv7, and YOLOv9. The outstanding accuracy can be expected to effectively handle the varying degrees of occlusion interference. Particularly, the improved model can be deployed on mobile devices in order to detect the daily behaviours of the dairy cows in the highly occluded complex environments. This finding can provide robust technical support to monitor the cow behaviour in the large-scale farm.

HTML全文

参考文献(35)

施引文献

资源附件(0)