索俊锋, 刘勇. 基于农业本体的语义相似度算法及其在农作物本体中的应用[J]. 农业工程学报, 2016, 32(16): 175-182. DOI: 10.11975/j.issn.1002-6819.2016.16.024
    引用本文: 索俊锋, 刘勇. 基于农业本体的语义相似度算法及其在农作物本体中的应用[J]. 农业工程学报, 2016, 32(16): 175-182. DOI: 10.11975/j.issn.1002-6819.2016.16.024
    Suo Junfeng, Liu Yong. Semantic similarity algorithm based on agricultural ontology and its application on crop ontology[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(16): 175-182. DOI: 10.11975/j.issn.1002-6819.2016.16.024
    Citation: Suo Junfeng, Liu Yong. Semantic similarity algorithm based on agricultural ontology and its application on crop ontology[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(16): 175-182. DOI: 10.11975/j.issn.1002-6819.2016.16.024

    基于农业本体的语义相似度算法及其在农作物本体中的应用

    Semantic similarity algorithm based on agricultural ontology and its application on crop ontology

    • 摘要: 语义相似度计算是信息处理中的一个关键问题。针对领域本体语义相似度计算模型考虑因素过于单一、主观性较强、计算精度较低等问题,该文首先针对本体模型的结构特点,提出一种计算节点密度的新方法,并从模型概念间的关系类型、节点密度、节点深度等方面分析本体概念相似度的组成,并赋予不同的权重,从而计算概念对间的结构相似度;接着,根据文献资料和经验知识,构建本体概念对的属性网格以获取概念的属性相似度;然后,基于本体层次网络结构计算根、叶节点的B-U概率,进而计算语义信息量,该方法不依赖于专家经验,具有客观性;再次,结合本体结构、信息量、属性等影响相似度的因素,提出一种计算概念间语义相似度的综合算法,该算法考虑到不同的影响因子在语义相似度计算中的重要程度不同,从而赋予农业本体中概念对关系不同的权值;最后,以农业领域农作物本体的语义相似度的计算为例进行实例验证,并给出"甜玉米"和"糯玉米"语义相似度计算的详细过程。试验表明该文提出的语义相似度算法的计算结果(0.8206)和标准差(0.0565)较之其他3种语义相似度算法更接近人们直观认知和专家意见,能有效提高语义相似度计算的精度和有效性。

       

      Abstract: Abstract: Ontology represents a structured view of the domain containing rich semantic meanings, thus plays an important role for various knowledge-intensive applications. Domain ontology contains the concept of complete information and extensive links between concepts. Construction of agricultural domain ontology can provide the foundation of knowledge organization for vertical agricultural research engine and promoting agricultural informationization and realizing cooperative service of agricultural information. Semantic similarity measure plays an important role in information retrieval and information integration based on ontology. While, traditional semantic similarity algorithms of domain ontology only focus on single influencing factor, which leads to poor convergence performances, lower accuracy, strong subjectivity and other defects. In this paper, a weighted semantic similarity algorithm based on agricultural domain ontology was proposed. According to the characteristics of the different structure of ontology, the major factors that influence the similarity are the structure factor, property of the concept, information, etc. But the structural factors are impacted by relationship type, density of node, depth of node and other factors. First, according to the structural characteristics of ontology model, a new method for calculating the node density was proposed in this study. At the same time, an integrated structure similarity model based on relationship type, node density, depth integrated structure similarity, semantic distance was given, which was called the structure factors. Second, according to the literature and empirical knowledge, the property grid of ontology concept pairs was accessed to gain the attributes of the concept. Third, according to ontology hierarchial network, B-U probability based on root and leaf nodes and semantic information was calculated, which did not rely on the expertise and was objective. Fourth, combining semantic structure, information and property factors, an integrated semantic similarity algorithm was proposed, which considered that different impact factors had a different important degree in the calculation of semantic similarity and were given different weights to agricultural ontology relations. Finally, taking semantic similarity computation of part of agriculture ontology for example, the calculation process of semantic similarity on "sweet corn" and "waxy maize" was enumerated in detail. According to the semantic similarity algorithm proposed in this paper, comparing the calculation results of semantic similarity (0.8206) and standard deviation (0.0565) with other algorithms, it was closer to the intuitive cognition and expert advice, which can effectively improve the accuracy and validity of semantic similarity computation. In this paper, we presented the effort of computing the semantic similarity values via studying the relationship between concept pairs of agricultural ontologies at different depth of ontology hierarchical structure. We evaluated the accuracy of semantic similarity calculation between four different algorithms (algorithm in this paper, algorithm based on information content, algorithm based on distance, and algorithm using standard deviation commonly applied in statistics). The results of this study demonstrated that with proper selection of parameters and comprehensive similarity computation measures, we can significantly reduce difficulty of distinguishing the concept of weak correlation. This study provides a deeper understanding of the application of semantic similarity to agricultural ontologies, and shows how to choose appropriate semantic similarity measures for agricultural information retrieval.

       

    /

    返回文章
    返回