基于MI-HSIC混合加权指数电子鼻特征优化的黄连产地鉴别

    Origin identification of Coptis Chinensis via electronic nose feature optimization based on MI-HSIC hybrid weighted index

    • 摘要: 黄连作为中国重要道地药材,其临床疗效与产地密切相关。针对传统黄连检测存在设备昂贵、操作复杂等问题,该研究提出基于电子鼻技术的黄连产地鉴别方法。为提高电子鼻的检测性能,提出了一种基于互信息-希尔伯特-施密特独立性准则(mutual information-Hilbert-Schmidt independence criterion,MI-HSIC)混合加权指数的气体传感器阵列特征优化算法,使用基于贝叶斯神经网络(Bayesian neural network,BNN)代理模型的改进型贝叶斯优化算法,替代传统贝叶斯优化算法中的高斯模型,自适应调节MI与HSIC在特征重要性评估中的权重,实现关键气体传感特征的自适应筛选。采用支持向量机、K近邻算法和随机森林对六大道地产区的片状与粉状黄连进行产地溯源分析。结果表明,3种分类算法分别只需6、13、20个特征即实现测试集准确率96.25%、93.33%、94.58%,相较全部70个特征对照组提升了2.91%~4.58%,特征数量减少71.43%~91.43%,相较于传统基于MI/HSIC单一指数特征优化方法,该研究所提出的基于MI-HSIC混合加权指数的特征优化方法在提升测试集分类准确率的同时,有效降低了特征维度,展现出更优的综合性能。此外,相较于传统基于高斯模型的贝叶斯优化算法,该研究提出的基于BNN代理模型的贝叶斯优化算法单次迭代时间提高了19.75%~36.01%。该研究所提出的基于MI-HSIC混合加权指数的电子鼻特征优化方法能够有效改善电子鼻系统性能,为实现准确、高效、低成本的黄连道地性鉴别提供了一种可行的技术手段。

       

      Abstract: Coptis chinensis (Huanglian) is one of the most significant Chinese medicinal herbs. Its clinical efficacy is closely associated with the geographical origin, leading to the chemical composition, pharmacological activity, and market value. However, the conventional analytical techniques cannot accurately verify the authenticity and origin of the Huanglian, such as high-performance liquid chromatography (HPLC) and mass spectrometry, due to the high labor and instrument costs, and the complex sample preparation and operation. It is often required for the rapid and on-site quality assessment. In this study, a cost-effective approach was proposed to discriminate the geographical origins of Coptis chinensis using an electronic nose (e-nose) system. Volatile organic compound (VOC) profiles were also captured as the fingerprint signatures of regional growing conditions. A feature optimization was developed for the gas sensor arrays in order to enhance the discrimination and efficiency of the e-nose. A hybrid weighting index was then integrated with the mutual information (MI) and Hilbert-Schmidt independence criterion (HSIC), termed the MI-HSIC. The complementary strengths of MI were utilized to capture the nonlinear dependencies. The HSIC was used to measure the statistical independence in the reproducing kernel Hilbert spaces. Thereby, the hybrid index was realized on a more comprehensive evaluation of the feature relevance and redundancy. An improved Bayesian optimization was employed for the adaptability of the feature selection. A Bayesian neural network (BNN) was used to replace the conventional Gaussian model, which served as the surrogate model. The BNN-based optimizer dynamically adjusted the relative weights of the MI and HSIC during iterative optimization. The critical gas-sensing features were adaptively and intelligently screened to consider some uncertainty in the model predictions after probabilistic inference. Subsequently, three classifiers of machine learning, support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF), were applied to perform the geographical traceability analysis. Both sliced and powdered Coptis chinensis samples were collected from six major authentic regions in China. Experimental results demonstrate that the MI-HSIC hybrid weighting method significantly enhanced the classification accuracy and drastically reduced the feature dimensionality. Specifically, the SVM, KNN, and RF classifiers achieved the test accuracy of 96.25%, 93.33%, and 94.58%, respectively, using only 6, 13, and 20 optimal features. The accuracy was also improved by 2.91%~4.58% using all 70 original sensor features, compared with the baseline models. While the number of features was simultaneously reduced by 71.43%~91.43%, indicating the high computational efficiency and model interpretability. Furthermore, the MI-HSIC approach shared the superior performance after feature optimization, compared with either MI or HSIC. The higher classification accuracy was achieved with fewer features. In addition, the BNN-based Bayesian optimization exhibited high computational efficiency. The single-iteration time was reduced by 19.75%~36.01%, compared with the traditional Gaussian process-based Bayesian optimization, indicating the high hyperparameter tuning and scalability. The robustness of the model was further validated on the independent test dataset, indicating the generalization under different physical forms of Coptis chinensis. The MI-HSIC hybrid weighting index with the BNN-enhanced Bayesian optimization reduced the effective dimensionality for the high performance in e-nose systems. Overall, a robust, accurate, and cost-effective solution can be expected for the geographical authentication of genuine Coptis chinensis. A promising alternative can also be offered for the conventional analytical techniques. This work can greatly contribute to the intelligent sensing technologies in the quality control of Chinese medicines, particularly for the rapid, non-destructive, and on-site herb authentication in practical applications.

       

    /

    返回文章
    返回