Abstract:
Coptis chinensis (Huanglian) is one of the most significant Chinese medicinal herbs. Its clinical efficacy is closely associated with the geographical origin, leading to the chemical composition, pharmacological activity, and market value. However, the conventional analytical techniques cannot accurately verify the authenticity and origin of the Huanglian, such as high-performance liquid chromatography (HPLC) and mass spectrometry, due to the high labor and instrument costs, and the complex sample preparation and operation. It is often required for the rapid and on-site quality assessment. In this study, a cost-effective approach was proposed to discriminate the geographical origins of
Coptis chinensis using an electronic nose (e-nose) system. Volatile organic compound (VOC) profiles were also captured as the fingerprint signatures of regional growing conditions. A feature optimization was developed for the gas sensor arrays in order to enhance the discrimination and efficiency of the e-nose. A hybrid weighting index was then integrated with the mutual information (MI) and Hilbert-Schmidt independence criterion (HSIC), termed the MI-HSIC. The complementary strengths of MI were utilized to capture the nonlinear dependencies. The HSIC was used to measure the statistical independence in the reproducing kernel Hilbert spaces. Thereby, the hybrid index was realized on a more comprehensive evaluation of the feature relevance and redundancy. An improved Bayesian optimization was employed for the adaptability of the feature selection. A Bayesian neural network (BNN) was used to replace the conventional Gaussian model, which served as the surrogate model. The BNN-based optimizer dynamically adjusted the relative weights of the MI and HSIC during iterative optimization. The critical gas-sensing features were adaptively and intelligently screened to consider some uncertainty in the model predictions after probabilistic inference. Subsequently, three classifiers of machine learning, support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF), were applied to perform the geographical traceability analysis. Both sliced and powdered
Coptis chinensis samples were collected from six major authentic regions in China. Experimental results demonstrate that the MI-HSIC hybrid weighting method significantly enhanced the classification accuracy and drastically reduced the feature dimensionality. Specifically, the SVM, KNN, and RF classifiers achieved the test accuracy of 96.25%, 93.33%, and 94.58%, respectively, using only 6, 13, and 20 optimal features. The accuracy was also improved by 2.91%~4.58% using all 70 original sensor features, compared with the baseline models. While the number of features was simultaneously reduced by 71.43%~91.43%, indicating the high computational efficiency and model interpretability. Furthermore, the MI-HSIC approach shared the superior performance after feature optimization, compared with either MI or HSIC. The higher classification accuracy was achieved with fewer features. In addition, the BNN-based Bayesian optimization exhibited high computational efficiency. The single-iteration time was reduced by 19.75%~36.01%, compared with the traditional Gaussian process-based Bayesian optimization, indicating the high hyperparameter tuning and scalability. The robustness of the model was further validated on the independent test dataset, indicating the generalization under different physical forms of
Coptis chinensis. The MI-HSIC hybrid weighting index with the BNN-enhanced Bayesian optimization reduced the effective dimensionality for the high performance in e-nose systems. Overall, a robust, accurate, and cost-effective solution can be expected for the geographical authentication of genuine
Coptis chinensis. A promising alternative can also be offered for the conventional analytical techniques. This work can greatly contribute to the intelligent sensing technologies in the quality control of Chinese medicines, particularly for the rapid, non-destructive, and on-site herb authentication in practical applications.