Segmenting hanging watermelon fruits under occlusion
-
Abstract
Accurate segmentation and projection area estimation are often required for the hanging watermelon fruits under occlusion in greenhouse cultivation. It is critical for the precise phenotypic analysis and intelligent yield prediction in the automatic management of the water and fertilizer. In this study, a hybrid computational framework was developed to integrate the semantic segmentation with the geometrical model fitting and deep learning. Specifically, the fruit contours were reconstructed from the heavily occluded images, where the fruits were often partially obscured by the leaves, vines, and support nets. Three stages were divided after comparison. Firstly, a systematic evaluation was conducted among five representative semantic segmentation models: U-Net with VGG backbone, U-Net with ResNet50 backbone, DeepLabv3 with MobileNet backbone, DeepLabv3 with Xception backbone, and the Pyramid Scene Parsing Network. After that, the best-performing model was selected to generate the initial masks of the fruit region. Next, a border following algorithm was utilized to extract the ordered contour coordinates from the binary mask. Finally, a two-stage ellipse fitting was applied: an adaptive RANSAC algorithm was used to identify the true boundary inliers, with the automatic parameter tuning, followed by least squares optimization on these inliers to determine the ellipse parameters. Projected area was calculated from the fitted ellipse. A high-quality image dataset was constructed with the 2,000 images of two prominent cultivars of the hanging watermelon. The dataset was then designed to contain a uniform distribution of the various occlusion types (leaf, vine, net bag, and identification tag) and severity levels, as well as the non-occluded reference samples. Experimental results demonstrated that the U-Net model with a VGG16 backbone was achieved the superior performance of the segmentation, with a precision of 99.41%, a recall of 99.36%, and an Intersection over Union (IoU) score of 98.78%, thus outperforming the rest four candidate models. An exceptionally reliable foundation was provided for the subsequent geometrical processing. The adaptive RANSAC with the least squares fitting was proved dramatically more robust than the conventional fitting on all contour points. Quantitative evaluation on the projection area showed that the outstanding coefficient of determination (R2) of 0.99 was achieved, compared with manual measurements. The root mean square error (RMSE) was 6.35 cm², and the mean absolute percentage error (MAPE) was 3.07%. The better performance was obtained with a 64% reduction in RMSE and a 62% reduction in MAPE, compared with the conventional baseline. The robustness was further confirmed under varying occlusion intensities; The MAPE was also remained consistently low at 2.41%, 3.01%, and 3.97%, respectively, when the test samples were categorized into the mild, moderate, and severe occlusion levels using the fitted inlier ratio. The specific occlusion was also suitable for the benchmark against the prominent Segment Anything Model (SAM) under both interactive and automatic modes. The lower estimation errors were observed with an RMSE of 6.35 cm2, compared with the 21.45 cm2 from interactive SAM and the 25.63 cm2 from auto-prompting SAM. A high-precision segmentation of the occluded watermelon fruits was realized in greenhouses using U-Net segmentation with the adaptive RANSAC ellipse fitting. The framework was reliably reconstructed on the fruit contours, particularly for the accurate, automatic estimation of the phenotypic traits, such as the projected area. The high robustness and efficiency of the advanced vision can provide the technical foundation for the growth monitoring, yield prediction, and cultivation optimization in the protected horticulture.
-
-