Abstract:
Tip-burn refers to the dieback from the edges and tips of the leaves in agriculture. Tip-burn stress has posed a serious threat to the growth of green leafy vegetables, particularly in controlled environments, such as plant factories. Fast-growing species (like lettuce) are very sensitive to the tip-burn stress. Therefore, it is of great significance to early detect the tip-burn stress for the high quality and yield of leafy vegetables. Among them, the tip-burn can be typically represented as the necrotic brown spots at the tips and margins of the rapidly developing leaves. However, it is still challenging to detect the tip-burn stress in the early stages, due to the small size. Moreover, the existing detection is often limited to the complex lighting under the artificial lights in plant factories. In this study, an improved model (called RT-DETR-TB) was proposed to detect the tip-burn stress in leafy vegetables using the RT-DETR framework. Firstly, the lightweight network (StarNet) was used as the backbone. The star operation was employed to enhance the feature extraction, where the inputs were mapped into implicitly high-dimensional spaces. Secondly, the original feature fusion was replaced with the star-attention feature fusion (SAFF) module. The star operation and channel prior convolutional attention (CPCA) were combined to fuse the multi-scale features. The accuracy of the detection was enhanced for the tip-burn targets at the different scales. Finally, the cross-scale edge enhancement (CSEE) module was added to the encoder. The shallow edge feature was utilized to improve the performance of the detection on the small targets. Experimental results showed that a detection speed of 58 frames per second, an average precision (AP
50) of 88.4% and an average precision were achieved in the small objects (AP
S) of 50.7% for tip-burn detection. The RT-DETR-TB also shared the 16.4 M parameters and 52.1 G floating points. Compared with the RT-DETR, the RT-DETR-TB model reduced the number of parameters and floating-point operations by 18.4% and 11.1%, respectively, whereas there was an improvement in AP
50 and AP
S by 2.4 and 3.9 percentage points, respectively. Moreover, the improved model converged faster, indicating a steeper and smoother loss decline. Additionally, four tests were constructed to represent the different lighting scenarios in order to evaluate the generalization under real plant factory environments. The performance of the detection varied greatly, due mainly to the visual distinction among tip-burn symptoms, healthy leaves, and the background under different lighting conditions. Multiple object detection was also tested over the different datasets. The RT-DETR-TB model outperformed the rest under all lighting conditions. More effective and robust performance was achieved to detect especially small tip-burn instances that were often missed by the rest models. The RT-DETR-TB model achieved an average AP
50 of 77.1%, which was improved by 2.5, 1.7, and 0.9 percentage points over YOLOv5, YOLOv8, and RT-DETR, respectively, fully meeting the real-time accuracy requirements for the tip-burn stress detection under the complex environments of the plant factories. This finding can provide a technological solution for visual health monitoring in leafy vegetable production. Furthermore, early and precise identification of the stress symptoms can also enhance the crop yield and quality control of the plant factories in modern agriculture.