Abstract:
An accurate and rapid detection of foodborne pathogens is often required for low reagent dependency, due to the complex and time-consuming sample preprocessing. In this study, qualitative detection was conducted on
Escherichia coli,
Salmonella, and
Staphylococcus aureus within a mixed bacterial system originating from mutton. The hyperspectral imaging (HSI) technology was also combined with minimum redundancy maximum relevance (mRMR) - gradient boosting decision tree (GBDT) series feature selection. Firstly, the microbial community composition was identified in fresh mutton from common retail scenarios. The structure of the mutton-origin symbiotic microbiota was determined to construct a multi-bacterial mixed system, according to the actual detection conditions. Secondly, the HSI data of multi-bacterial co-cultures were acquired for the mutton-origin symbiotic microbiota and the target foodborne pathogens. One-dimensional average spectra were extracted to generate the masks using morphological methods. These average spectra were then preprocessed to reduce the noise and scattering for the baseline drift. Thereby, the data usability and the reliability were enhanced after correction. The specific preprocessing techniques were applied after smoothing (savitzky-golay, SG). The spectral data were subjected to the first derivative (1D), the second derivative (2D), and standard normal variate (SNV) transformations. The impacts of different preprocessing approaches on model performance were compared to identify the optimal preprocessing approach. According to the optimally preprocessed spectral data, the mRMR-GBDT series feature selection—specifically mRMR-XGBoost, mRMR-LightGBM, and mRMR-CatBoost—was employed to screen for the feature wavelengths relevant to foodborne pathogens detection in the SG and 2D preprocessed spectral data. Finally, the performances of the feature selection with foodborne pathogens classification models were systematically compared to determine the optimal feature extraction and classification model. The results indicated that the total bacterial count ranged from 10 to 10
5 CFU/g in the mutton samples, with 90.5% of samples within the 10
2 to 10
4 CFU/g range. The most prevalent count was 10
3 CFU/g (37.5%), followed by 10
2 CFU/g (29.5%) and 10
4 CFU/g (23.5%). Counts of 10 and 10
5 CFU/g were accounted for 3.0% and 6.5%, respectively. Further analysis revealed that the core symbiotic microbiota in the mutton primarily consisted of
Pseudomonas,
Acinetobacter, and
Proteus. The dominant core populations were
Pseudomonas and
Acinetobacter, with the maximum relative abundances of 35.67% and 35.56%, respectively. The sub-dominant populations were
Proteus (21.77%) and
Bacillus (20.32%). The relative abundances of genera, such as
Klebsiella, were all below 10%. The relatively rare one was classified in the mutton-origin symbiotic microbiota. The mRMR-GBDT series was selected for the feature wavelengths of 17 (mRMR-XGBoost), 21 (mRMR-LightGBM), and 12 (mRMR-CatBoost), respectively. Among them, the mRMR-CatBoost demonstrated that there was an optimal balance for the feature wavelengths with the superior spectral relevance and the minimal redundancy. The 12 features by mRMR-CatBoost were evenly distributed over the full spectral range. Seven of these features were associated with overtone and combination vibrational absorption bands of functional groups, such as C-H, O-H, and N-H. The synergistic mechanisms were effectively preserved among wavelengths. Support vector machine (SVM), light gradient boosting machine (LightGBM), and back propagation neural network (BPNN) classification models were constructed using the optimal feature subset. Comparative analysis showed that the BPNN model performed best in the accuracy of 97.63% and 96.49% on the validation and test sets, respectively. Compared with the SVM and LightGBM models, the accuracy of BPNN on the validation set was higher by 9.17 and 7.25 percentage points, respectively, whereas its test set was higher by 1.64 and 5.77 percentage points, respectively. The HSI with the mRMR-CatBoost-BPNN framework was achieved in the highly efficient detection of foodborne pathogens within a mixed bacterial system that derived from authentic mutton samples. This finding can provide a theoretical basis and technical reference for the efficient and accurate detection of foodborne pathogens in animal-derived foods, such as mutton.