面向收割机影像的轻量化DeepLabV3+胡麻倒伏识别方法

范翔宇; 李玥; 魏霖静; 高玉红; 郭林海; 周慧; 康亮河; 李永彪; 范辉

doi:10.11975/j.issn.1002-6819.202504004

面向收割机影像的轻量化DeepLabV3+胡麻倒伏识别方法

Detecting flax lodging from harvester images using lightweight DeepLabV3+

摘要

摘要: 针对胡麻倒伏检测中存在的背景复杂以及模型计算量大等问题，该研究提出了一种改进DeepLabV3+的轻量化胡麻倒伏识别模型。首先采用轻量化主干网络MobileNetV2，减少模型的训练时长；然后引入坐标注意力机制CA（coordinate attention），增强模型对小范围倒伏区域的定位能力；再次，将原有的交叉熵损失函数（cross-entropy loss, CE_Loss）替换为更适合倒伏识别情境下的Focal Loss，同时在总损失中添加Dice Loss，增强数据类别不平衡情况下的识别效果。试验结果表明，改进后的DeepLabV3+模型在胡麻倒伏识别任务中提升了精度和效率，平均精确率达95.96%，平均交并比（mean intersection over union, mIoU）和平均像素精度（mean pixel accuracy, mPA）分别达到了92.55%和96.11%，相比HRNet、PSPNet、U-Net、SegNeXt-S、DeepLabV3+模型其mIoU分别提升1.08、3.74、3.06、11.79和1.59个百分点，mPA分别提升0.92、2.80、1.58、8.68和1.17个百分点；模型训练时长由原DeepLabV3+的27.3 h缩短为14.2 h；同时满足了实时性识别要求，平均检测帧率为83 帧/s。该研究为农业场景下的实时倒伏检测及收割机作业优化提供了可行的技术方案。

Abstract: Accurate lodging detection is often required to the harvester operations. The missed or false detections can directly affect the stubble height control, leading to the low harvesting efficiency and crop wastes. Although remote sensing data has been widely applied in various fields, such data cannot be directly used in crop harvesting operations with the harvesters. Alternatively, the artificial intelligence and machine vision can be expected to detect the crop lodging in recent years. Most deep learning-based studies have been identified the crop lodging areas of the staple crops, such as wheat, corn, and rice. It is still lacking on the flax. In this study, the industrial cameras were installed on the harvesters, in order to collect the real-time image data from flax fields. The visual data was captured to guide the harvesting operations. Precise lodging detection was realized to optimize the harvesting performance. A lightweight model was also proposed to detect the flax lodging under the complex backgrounds and high computational costs using DeepLabV3+s. The MobileNetV2 was adopted as the lightweight backbone network. Training time was significantly shortened for the feature extraction using inverted residual structures and linear bottleneck layers. The MobileNetV2 was substantially reduced both training time and computational costs. The hardware deployment was feasible for the real-time detection on the rice lodging. The flax lodging areas were identified to effectively simulate the spatial location and long-range dependencies. Therefore, a Coordinate Attention (CA) mechanism was incorporated into the encoder-decoder structure of DeepLabV3+. The feature maps were then decomposed along the horizontal and vertical directions. Spatial features were individually captured to enhance the lodging areas using attention mechanisms. Experimental results demonstrated that the CA was significantly improved the recognition accuracy of the lodging areas, particularly for the detection performance under the complex backgrounds. The conventional Cross-Entropy Loss (CE_Loss) was replaced with the Focal Loss, in order to balance the data class. The sample weights were dynamically adjusted to detect the hard-to-classify lodging. Additionally, the Dice Loss was integrated into the total loss function, in order to further improve the precision of the segmentation boundaries. The Intersection over Union (IoU) was directly optimized to enhance the performance of the segmentation. In training with 200 epochs, the improved DeepLabV3+ model was enhanced both accuracy and efficiency in the recognition of the flax lodging. The average precision reached as high as 95.96%, with the mean Intersection over Union (mIoU) and mean Pixel Accuracy (mPA) of 92.55% and 96.11%, respectively. Compared with the HRNet, PSPNet, U-Net, SegNeXt-S, and the original DeepLabV3+ models, the mIoU of the improved model increased by 1.08, 3.74, 3.06, 11.79, and 1.59 percentage points, respectively, while its mPA increased by 0.92, 2.80, 1.58, 8.68, and 1.17 percentage points, respectively. The training time was reduced from 27.3 h (original DeepLabV3+) to 14.2 h, fully meeting the real-time recognition requirements with an average detection frame rate of 83 frames per second. The backbone network was replaced with the MobileNetV2. The training efficiency was significantly improved, where the training time was reduced by 12.4 h, while the mIoU and mPA decreased by only 0.49 and 0.20 percentage points, respectively. The training cost was substantially reduced with the minimal precision loss. Furthermore, the training time increased slightly under CA attention mechanism, but the mIoU and mPA increased by 1.30 and 0.81 percentage points, respectively, indicating the high accuracy. The loss function was also significantly improved the model performance. The training time reached the minimum of 14.2 h, while the mIoU and mPA reached 92.55% and 96.11%, respectively. The mIoU increased by 1.59, 2.08, and 0.78 percentage points, respectively, while the mPA increased by 1.17, 1.37, and 0.56 percentage points, respectively. Therefore, the intelligent flax harvesting can be expected to identify the lodging areas. This finding can provide the feasible technical solution to the real-time lodging detection and harvester optimization in modern agriculture.

HTML全文

参考文献(34)

施引文献

资源附件(0)