Cross-species identification of maize seed storage year by hyperspectral combination of physicochemical parameters
-
-
Abstract
Aging and deterioration of corn seeds have been two of the key influencing factors on their vitality. Therefore, it is of great significance to identify the storage time of seeds for the preservation of germplasm resources and the quality identification of seeds. This study aims to identify the stored corn seeds over the different years using machine learning. 103 corn varieties were also selected for the correlation analysis among the physicochemical parameters and hyperspectral bands. Hyperspectral imaging was employed to obtain the spectral data from both sides of the seed embryonic and endosperm. The near-infrared component analyzer (IM9500) and portable ultraviolet-visible fluorescence spectrometer (Multi-plex) were utilized to acquire the physicochemical parameters of corn seeds. Variance and linear discriminant analysis was conducted to determine the trends of physicochemical parameters within corn seeds, and their impact on the discrimination of storage time. The influencing factors were selected with the physicochemical parameters on the discrimination of corn seed storage time, as the storage time increased. The hyperspectral data was also performed on the black-and-white correction. The threshold segmentation was used to effectively separate the background area, in order to obtain the region of interest (ROI). Among them, the spectral average of all pixel points on the image served as the spectral data of ROI. Five preprocessing methods, including savitzky-golay (SG) smoothing, standard normal variate transformation (SNV), multiplicative scattering correction (MSC), first derivative (1-Der), and second derivative (2-Der), were applied to the spectral data, in order to eliminate the interference signals during spectral acquisition, such as the background noise, baseline drift, and stray light. Feature wavelength was selected from the complete spectral data, due mainly to the hyperspectral images with many bands and redundant information. Competitive adaptive reweighted sampling (CARS) and uninformative variable elimination (UVE) were used to select the feature wavelength. Support vector machine (SVM), back propagation neural network (BPNN), and convolutional neural network (CNN) classification models were developed to preprocess the spectral data from the embryo and endosperm surfaces. The spectral data was obtained as the input for the models. The classification data was compared with the different models. The results indicated that better discrimination was achieved in the spectral data from the embryonic side on the storage time of corn seeds. Modeling with the preprocessed and feature-selected spectral data significantly outperformed that with the raw data. Pearson correlation analysis was conducted among physicochemical parameters that significantly affected the seed age discrimination and hyperspectral bands. These parameters were selected with the highly correlated feature bands for modeling. The comparison was made on the modeling accuracy of feature wavelength selection, correlation coefficient, and the combination of correlation and feature wavelength selection. The performance of the model was verified to detect across multiple corn varieties, indicating the high generalization. BPNN classification model with the correlation coefficient and feature wavelength shared the highest accuracy, with a single kernel prediction accuracy of 92.3% and a Colony prediction accuracy of 94.4%. This finding can provide a significant theoretical basis and practical implications for the precise management of corn seed storage in the seed industry.
-
-