Recognizing cucumber farming behavior using an improved time shift network
-
Abstract
Facility cucumber can be expected to transform from the experience-driven and extensive to the data-driven and precision-oriented production. It is often required for the automatic recognition of the cucumber farming behaviors. However, some challenges still remained in the video recognition of the agricultural behaviors in the practical greenhouse environments, including the large inter-class variations of the actions, complex and cluttered backgrounds, frequent occlusions, and high visual similarity among different farming behaviors. These factors have also limited the recognition accuracy and robustness of existing action recognition. Therefore, this study aims to improve the precision and stability of the agricultural behavior recognition in complex facility scenarios. The research objects were selected as the cucumber farming behaviors from the spring and autumn cropping seasons at the Panhe Campus of Shandong Agricultural University, China. The video dataset of the agricultural behavior was constructed to represent the facility production scenarios. An optimal action recognition, named the LG-DTEA model, was proposed using the Temporal Shift Module (TSM) framework. Firstly, a fast pathway was designed to introduce the motion difference features, which extracted the inter-frame difference information using the lightweight shortcut structure. Subtle and continuous motion variations of the human bodies and hand movements were captured in agricultural operations. The sensitivity to fine-grained motion was also enhanced after optimization. Secondly, a spatiotemporal motion compression and excitation module was embedded into the ResNet50 backbone network. Motion-related information was compressed over the spatial and temporal dimensions. The feature responses were adaptively recalibrated to better simulate the dynamic dependencies over the consecutive video frames. Thirdly, a local–global temporal attention mechanism was introduced to facilitate the mapping and interaction learning between short-term temporal features and long-term global temporal representations. Local temporal continuity and global temporal context were jointly combined with the attention mechanism. The identification of the model was further enhanced for the highly similar agricultural behaviors. Extensive experiments were conducted on the cucumber farming behavior dataset in order to evaluate the effectiveness of the LG-DTEA model. The experimental results demonstrated that the LG-DTEA model achieved a Top-1 accuracy of 99.2% and a Top-5 accuracy of 99.8%. The Top-1 accuracy was improved by 4.5 percentage points and the Top-5 accuracy by 1.2 percentage points, compared with the original TSM model. The motion difference features, spatiotemporal excitation, and local–global temporal attention were effectively integrated to enhance both recognition accuracy and performance stability. Moreover, the robust recognition was maintained under complex backgrounds, varying illumination, and subtle inter-class action differences, indicating its strong adaptability to real greenhouse environments. In conclusion, the LG-DTEA model provided an effective and reliable solution for the accurate recognition of the cucumber farming behaviors in the complex facility scenarios. The framework demonstrated the strong robustness and generalization for the precise perception and intelligent analysis of the agricultural operations. This finding can also provide the theoretical support and technical reference for the practical deployment of the intelligent behavior recognition in the facility agriculture. A great contribution was also made to advance the smart agriculture driven by video perception.
-
-