DENG Yinan, LIAO Fafa, OUYANG Shangtao, et al. Non-destructive hardness detection of kiwifruit based on time-frequency representation and attention-enhanced convolutional neural networkJ. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2026, 42(2): 373-382. DOI: 10.11975/j.issn.1002-6819.202506147
    Citation: DENG Yinan, LIAO Fafa, OUYANG Shangtao, et al. Non-destructive hardness detection of kiwifruit based on time-frequency representation and attention-enhanced convolutional neural networkJ. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2026, 42(2): 373-382. DOI: 10.11975/j.issn.1002-6819.202506147

    Non-destructive hardness detection of kiwifruit based on time-frequency representation and attention-enhanced convolutional neural network

    • An intelligent tactile-sensing system can be expected to detect the fruits in modern agriculture. However, conventional optical non-destructive testing cannot fully meet the large-scale production requirements in recent years. The near-infrared spectroscopy has been constrained by fruit surface wax thickness and light scattering artifacts. Machine vision is also limited to the superficial morphological features. Therefore, the critical deficiencies can be found to characterize the internal structural evolution and resolve the sub-Newton firmness gradients. In this study, a tri-finger polyurethane Fin Ray flexible gripper can be expected to overcome some limitations. The Amor-SE-CNN framework was also proposed to assess the fruit quality. Multiresolution time-frequency analysis was converged with adaptive attention mechanisms. A vibration-dynamics approach was established to classify the precision maturity. The optical variables were reduced to maintain the non-destructive integrity. The hardware architecture was integrated with the strain gauges (1.2 cm×1.0 cm sensing area) epoxy-encapsulated at 4.62 cm from the gripper fingertips—an optimal position after the finite-element simulations, indicating the maximum deformation amplitude. During step-motor-controlled grasping sequences (0–12 mm/s closure velocity regulated by DM422 driver, 15 mm stroke), triaxial strain signals were recorded to determine as the four-stage preprocessing: (1) Transient artifact removal via slope-threshold interpolation; (2) fourth-order bidirectional Butterworth bandpass filtering (0.5–5.0 Hz) suppressing >5.0 Hz mechanical vibrations and <0.5Hz thermal drift; (3) Hilbert-transform envelope extraction isolating viscoelastic relaxation; and (4) amplitude normalization dynamically mapped to 0,1 range using piecewise linear scaling. Continuous wavelet transform (CWT) with the complex Morlet wavelets was used to transform 1D strain data into 224×224 pixel time-frequency matrices using logarithmic energy spectrum computation and bilinear interpolation. Three-channel RGB space fusion was performed on the spectrograms. The channel-specific energy distributions were encoded within the biomechanically critical 0.5–5.0 Hz band into composite color-textural signatures. There were the stiffness-dependent frequency modulations—exemplified by overripe fruits with the 0.5–1.5Hz dominant energy versus hard-unripe specimens concentrating at 2.5–5.0 Hz. The convolutional neural network was employed as a squeeze-and-excitation attention module. The global context aggregation (GAP→8D descriptor→sigmoid-activated 32D reconstruction) was implemented to adaptively amplify the firmness correlated spectral components. While 3×3 dynamic convolution kernels with ReLU activation enhanced the spatial sensitivity to localized energy discontinuities. Training was incorporated to enhance the multi-strategy robustness: Stochastic data augmentation (±10% random cropping, ±20% brightness jitter, ±15% contrast modulation) was simulated for field operation variances; 50% Dropout regularization was countered the small-sample overfitting; and Adam optimization was used to minimize the categorical cross-entropy across 100 epochs with early stopping. The validation was involved 420 kiwifruits ('Yangtao Bao': n=240; 'Hayward': n=180) with five physiological maturity tiers (F<9.4N: overripe; 9.4N≤F<11.3N: ripe; 11.3N≤F<13.7N: mid-ripe; 13.7N≤F<16.7N: unripe; F≥16.7N: hard-unripe), according to the GY 4 texture analyzer reference measurements. The results show that the Amor-SE-CNN achieved a 93.3% classification accuracy—surpassing the conventional CNN (84.8%), SE-CNN (88.6%), and time-frequency CNN (90.5%) baselines by 8.5, 4.7, and 2.8 percentage points, respectively, while outperforming prior tactile studies . Attention mechanisms were specifically enhanced the discrimination between the transitional maturity states. The "soft" vs "mid-ripe" F1 scores were elevated from 81% to 92% through 3–4 Hz band amplification. Physiological integrity was confirmed via respiration kinetics: CO2 evolution rates shared no statistically significant intergroup variance (P>0.05) during 72 h monitoring, thus verifying negligible mechanical stress. An experimental platform was constructed to detect the fruit firmness using a flexible gripper. Time-frequency analysis was integrated with an attention-enhanced Convolutional Neural Network (CNN). The effective classification of kiwifruit maturity was achieved after enhancement. The finding can also provide technical support for the intelligent post-harvesting of the fruits.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return