吴正敏, 曹成茂, 王二锐, 罗坤, 张金炎, 孙燕. 基于形态特征参数的茶叶精选方法[J]. 农业工程学报, 2019, 35(11): 315-321. DOI: 10.11975/j.issn.1002-6819.2019.11.036
    引用本文: 吴正敏, 曹成茂, 王二锐, 罗坤, 张金炎, 孙燕. 基于形态特征参数的茶叶精选方法[J]. 农业工程学报, 2019, 35(11): 315-321. DOI: 10.11975/j.issn.1002-6819.2019.11.036
    Wu Zhengmin, Cao Chengmao, Wang Errui, Luo Kun, Zhang Jinyan, Sun Yan. Tea selection method based on morphology feature parameters[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(11): 315-321. DOI: 10.11975/j.issn.1002-6819.2019.11.036
    Citation: Wu Zhengmin, Cao Chengmao, Wang Errui, Luo Kun, Zhang Jinyan, Sun Yan. Tea selection method based on morphology feature parameters[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(11): 315-321. DOI: 10.11975/j.issn.1002-6819.2019.11.036

    基于形态特征参数的茶叶精选方法

    Tea selection method based on morphology feature parameters

    • 摘要: 夏秋季节的梗与叶片的色泽差异小,采用传统色选机难以实现精选。该文提出依据茶叶形态特征的多特征向量分选法,以期实现茶叶精选算法快速建模,提高分选精度。采集动态下落过程中的茶叶图像,开发基于图像处理的特征提取程序自动提取多组茶叶样本形态特征参数,采用随机森林算法判定特征权重并进行特征选择,建立逻辑回归、决策树和支持向量机3种不同分类算法对样本进行分类,验证特征的可分性,并分析不同分类算法对复杂茶叶样本分类效果的影响。试验结果表明:1)形态特征参数圆形度E的重要性权重最大,为0.467,最终将重要性阈值设定为0.05,选择圆形度E、矩形度R、线性度Len、周长C和紧凑度J 5种形态特征向量建立数据集;2)在测试数据集中,逻辑回归(logistic regression, LR)、决策树(decision tree, DT)和支持向量机(support vector machine, SVM)3种分类算法的平均准确率为0.924,说明所选特征具有明显的可分性;3)根据输出的混淆矩阵,3种分类算法中支持向量机算法识别效果最好,准确率和调和平均数(F1)得分分别为93.8%和94.7%。该方法可快速应用于其他类型茶叶精选和茶叶实际生产过程,有效提高茶叶品质。

       

      Abstract: The color between stalks and leaves of tea in summer and autumn is similar, which means the traditional color sorter is difficult to sort based on optical characteristics. To realize the rapid modeling of tea selection algorithm and improve the sorting accuracy, a method for sorting the fine and bad products of tea by multi-feature vectors based on the morphological characteristics was introduced in this paper. First, Wuyishan Dahongpao tea was selected as a test sample to collect images during the dynamic drop process. The blue element image was extracted, and single sample’s binary image and edge were obtained by analysis of whole image connection area. Then, feature extraction program was developed based on image processing algorithm to extract morphological feature parameters of the tea samples automatically. Four simple shape descriptors-the sample perimeter, area, the length and width of minimum bounding rectangle were extracted. On this basis, eight complex shape descriptors-circularity, rectangularity, linearity, slightness, diameter, diagonal of minimum bounding rectangle, compactness and centroid were calculated. In addition, the random forest algorithm was used to determine the above features weight, the feature was selected according to weight threshold. Finally, logistic regression (LR), decision tree (DT) and support vector machine (SVM) that three different classification algorithms were established to classify the samples, verify the validity of the features and analyze the effects of different classification algorithms on the classification of tea. The original data were normalized and randomly segmented 80% used for training, 20% for testing. 10-fold cross-validation was used to select the optimal parameters of the classification model, and the training dataset was randomly divided into 10 parts, of which 9 parts were used for training, and the remaining 1 part was used for verification. According to the above machine learning system parameter optimization process to obtain the logical regression, decision tree and support vector machine optimal model, and statistical the final evaluation results on test dataset. The test results showed that: 1) The circularity weight was the highest, at 0.467, and five eigenvectors of circularity, rectangularity, linearity, perimeter and compactness were finally selected with the weight threshold value which was 0.05; 2) In the test dataset, the average accuracy F1 of the three classification algorithms was 0.924, suggesting that the established tea morphological feature descriptors has certain separability and better effect; 3)When testing test-dataset, the accuracy score was 91.7% and F1 score of logistic regression (LR) was 92.9%, the accuracy score was 91.7% and F1 score of support vector machine (SVM) was 94.7%.Support vector machine (SVM) algorithm was the best recognition effect in three classification algorithms; 4) From three different classification algorithms assessment score deviation, we can see that the generalization ability of the logic regression algorithm was stronger, the decision tree algorithm has a greater risk of over fitting. We get the lowest accuracy and F1 score of the logistic regression algorithm, while the support vector machine accuracy and F1 score were the highest, so in the evaluation of eigenvector comparability, multiple algorithms can be selected to evaluate the results of the average as the final basis for evaluation. In the experiment, we acquired dynamic image, which stay in line with the actual working conditions of the tea selection process, and can be extended to the actual processing of tea production.

       

    /

    返回文章
    返回