Abstract:
Deep learning has drawn much attention in recent years, due to its powerful feature extraction and excellent prediction. Particularly, it can also be expected for the target detection in the animal husbandry industry. The existing research can focus mainly on the cow behaviour in the outdoor farm environments. However, only relatively limited research can be found under the indoor complex scenarios. Additionally, some challenging factors have been presented by the indoor environment of farms, such as the high background similarity and severe occlusions. It is the high demand for the robustness and generalisation of the target detection models. In this study, a multi-target detection method was proposed for the daily behaviour of dairy cows in a complex environment using improved YOLO11n. Firstly, the images of the cow behaviour were collected in the real indoor environments of the farm. Four basic behaviours were conducted, including standing, walking, lying, and eating food. A multi-target cow behaviour dataset was then constructed to finely annotate the images using LabelImg software. The various scales, angles, and behavioural postures were covered for the model training. Furthermore, the bottleneck structure was optimised to design the model architecture using wavelet convolution. The bottleneck structure of the C3k2 module was reconstructed to introduce a wavelet transform domain for the feature extraction. The receptive field was effectively expanded to represent the complex background. The contextual information was significantly enhanced after reconstruction. The attention mechanism of the cascade group was integrated into the C2PSA module. A spatial-channel strategy was employed to improve the feature extraction in the occluded areas. In the feature fusion stage, the Efficient RepGFPN was utilised as the neck network, in order to effectively capture the features from the cow behaviour images at different scales. Comparative experiments were conducted to verify the performance of the improved mode in complex environments. The results show that the mean average precision of the WCG-YOLO11n was 95.3% on the daily behaviour detection of the dairy cows, which was improved by 2.2 percentage points compared with the baseline model. The mAP
0.5 increased by 2.7, 2.1, 1.3, 2.9, 2.4, and 0.6 percentage points, respectively, compared with the high floating point computation and parameter models, such as the Faster R-CNN, DETR, YOLOv5s, YOLOv7, YOLOv8n, and YOLOv9. The floating point operations, the model parameters, and the model sizes were 9.5 G, 3.9 M, and 8.3 MB, respectively, for the WCG-YOLO11n. There was a slight increase of 3.2 G, 1.3 M, and 2.8 MB, respectively, compared with the baseline model YOLO11n. However, these values were still significantly smaller than those of the high-precision models, such as the DETR, YOLOv5s, and YOLOv9. The results show that the model has a high average accuracy and processing speed while consuming fewer computing resources, and is suitable for mobile devices. Furthermore, the 23 cow behaviours were detected in the multi-behaviour, multi-scale, and dense scene scenarios. The 4 missed and 0 false detections demonstrated that the superior performance was achieved, compared with the YOLOv5s, YOLOv8n, and YOLOv11n. The excellent performance and stability of the detection were the same as the high computation and parameter models, like the Faster R-CNN, DETR, YOLOv7, and YOLOv9. The outstanding accuracy can be expected to effectively handle the varying degrees of occlusion interference. Particularly, the improved model can be deployed on mobile devices in order to detect the daily behaviours of the dairy cows in the highly occluded complex environments. This finding can provide robust technical support to monitor the cow behaviour in the large-scale farm.