mRMR-GBDT 结合高光谱成像检测动物源混合菌体系中食源性致病菌

    Detection of foodborne pathogens in mixed bacterial systems by mRMR-GBDT combined with hyperspectral imaging

    • 摘要: 针对现有食源性致病菌检测方法样本前处理复杂、耗时,且试剂依赖度高等问题,该研究利用高光谱(hyperspectral imaging, HSI)结合最小冗余最大相关 - 梯度提升决策树(minimum redundancy maximum relevance - gradient boosting decision tree, mRMR-GBDT)系列特征选择方法开展羊肉源混合菌体系中大肠埃希氏菌、沙门氏菌和金黄色葡萄球菌定性检测。首先,通过分析新鲜羊肉的菌群组成结构,探明羊肉源共生菌群结构并构建与实际检测状况相吻合的多菌混合体系;其次,获取由羊肉源共生菌群和食源性致病菌组成的多菌混合体系的HSI数据,并通过生成掩膜和形态学方法提取一维光谱,然后,采用mRMR-GBDT系列方法(包括mRMR-XGBoost、mRMR-LightGBM和mRMR-CatBoost)对预处理后的光谱数据开展致病菌相关的特征波长筛选;最后,系统比较不同特征选择方法与致病菌分类模型的性能并确定最佳特征提取方法和分类模型。研究结果显示:90.5% 的羊肉样本菌落总数分布在102~104 CFU/g区间,核心优势菌群的相对丰度分别为35.67%(假单胞菌属)和35.56%(不动杆菌属);mRMR-GBDT系列方法分别获取特征17、21和12个,其中mRMR-CatBoost方法筛选的12个特征波长兼具最优光谱关联性和最小冗余度;基于该特征子集分别构建支持向量机、轻量级梯度提升机和反向传播神经网络(back propagation neural network, BPNN)分类模型,经对比分析,BPNN模型表现最优,其验证集和测试集准确率分别达97.63%和96.49%。结果表明,HSI结合mRMR-CatBoost-BPNN能够实现真实来源的羊肉混合菌体系中食源性致病菌高效检测。该研究可为羊肉等动物源食品中致病菌的高效准确检测提供理论依据与技术参考。

       

      Abstract: An accurate and rapid detection of foodborne pathogens is often required for low reagent dependency, due to the complex and time-consuming sample preprocessing. In this study, qualitative detection was conducted on Escherichia coli, Salmonella, and Staphylococcus aureus within a mixed bacterial system originating from mutton. The hyperspectral imaging (HSI) technology was also combined with minimum redundancy maximum relevance (mRMR) - gradient boosting decision tree (GBDT) series feature selection. Firstly, the microbial community composition was identified in fresh mutton from common retail scenarios. The structure of the mutton-origin symbiotic microbiota was determined to construct a multi-bacterial mixed system, according to the actual detection conditions. Secondly, the HSI data of multi-bacterial co-cultures were acquired for the mutton-origin symbiotic microbiota and the target foodborne pathogens. One-dimensional average spectra were extracted to generate the masks using morphological methods. These average spectra were then preprocessed to reduce the noise and scattering for the baseline drift. Thereby, the data usability and the reliability were enhanced after correction. The specific preprocessing techniques were applied after smoothing (savitzky-golay, SG). The spectral data were subjected to the first derivative (1D), the second derivative (2D), and standard normal variate (SNV) transformations. The impacts of different preprocessing approaches on model performance were compared to identify the optimal preprocessing approach. According to the optimally preprocessed spectral data, the mRMR-GBDT series feature selection—specifically mRMR-XGBoost, mRMR-LightGBM, and mRMR-CatBoost—was employed to screen for the feature wavelengths relevant to foodborne pathogens detection in the SG and 2D preprocessed spectral data. Finally, the performances of the feature selection with foodborne pathogens classification models were systematically compared to determine the optimal feature extraction and classification model. The results indicated that the total bacterial count ranged from 10 to 105 CFU/g in the mutton samples, with 90.5% of samples within the 102 to 104 CFU/g range. The most prevalent count was 103 CFU/g (37.5%), followed by 102 CFU/g (29.5%) and 104 CFU/g (23.5%). Counts of 10 and 105 CFU/g were accounted for 3.0% and 6.5%, respectively. Further analysis revealed that the core symbiotic microbiota in the mutton primarily consisted of Pseudomonas, Acinetobacter, and Proteus. The dominant core populations were Pseudomonas and Acinetobacter, with the maximum relative abundances of 35.67% and 35.56%, respectively. The sub-dominant populations were Proteus (21.77%) and Bacillus (20.32%). The relative abundances of genera, such as Klebsiella, were all below 10%. The relatively rare one was classified in the mutton-origin symbiotic microbiota. The mRMR-GBDT series was selected for the feature wavelengths of 17 (mRMR-XGBoost), 21 (mRMR-LightGBM), and 12 (mRMR-CatBoost), respectively. Among them, the mRMR-CatBoost demonstrated that there was an optimal balance for the feature wavelengths with the superior spectral relevance and the minimal redundancy. The 12 features by mRMR-CatBoost were evenly distributed over the full spectral range. Seven of these features were associated with overtone and combination vibrational absorption bands of functional groups, such as C-H, O-H, and N-H. The synergistic mechanisms were effectively preserved among wavelengths. Support vector machine (SVM), light gradient boosting machine (LightGBM), and back propagation neural network (BPNN) classification models were constructed using the optimal feature subset. Comparative analysis showed that the BPNN model performed best in the accuracy of 97.63% and 96.49% on the validation and test sets, respectively. Compared with the SVM and LightGBM models, the accuracy of BPNN on the validation set was higher by 9.17 and 7.25 percentage points, respectively, whereas its test set was higher by 1.64 and 5.77 percentage points, respectively. The HSI with the mRMR-CatBoost-BPNN framework was achieved in the highly efficient detection of foodborne pathogens within a mixed bacterial system that derived from authentic mutton samples. This finding can provide a theoretical basis and technical reference for the efficient and accurate detection of foodborne pathogens in animal-derived foods, such as mutton.

       

    /

    返回文章
    返回