Abstract:
Agricultural product standards can be used to support agricultural product safety and supervision in recent years. Nevertheless, the related terms of agricultural product standards are too decentralized and isolated from each other without any systematic correlation and reuse at present. Knowledge graphs can connect the various types of information together to form a network, thus analyzing from a "relational" perspective. This study aims to design the ontology rules for the agricultural standard information using the drafting specifications of standardized documents and relevant Baidu encyclopedia entry data. A suitable regular wrapper was also designed for the semi-structured data. Better performance was achieved to extract the standard document information, with the accuracy and F1 indexes above 95%. At the same time, an open relationship extraction model was established in the agricultural products field (OREM-AF) for the unstructured data using dependency parsing. This model was used to first learn the dependency structure between entity pairs for the triple labels of the training corpus, and further generate the entity relationship extraction paradigm logical expressions. After all the training corpus was learned, the test corpus was analyzed by the dependency syntax to obtain the core vocabulary chain of the corpus. Then, the substructure tree with the core vocabulary was taken as the root node for the corresponding entity pairs and relationships by matching the learned entity relationship dependency structure paradigm set for the corresponding triple. Finally, the automatic extraction of agricultural products was realized the related information triple. The experimental results show that the OREM-AF presented a 74.22% accuracy and 75.12% F1 value on the agricultural product data set, while the 84.51% accuracy and 75.43% F1 value on the common data set. The extraction performed better using dependency parsing, due to the active learning and fine-grained sibling substitution, compared with the other models. It infers that the active learning capability led to the strong migration. Relying on the neo4j graph database storage, a knowledge map was constructed in the field of agricultural standards, which clearly and quickly captured the links to information that needs to be retrieved, thus providing supplementary analytical support for the regulation of agricultural products. The community mining was carried out in the network of agricultural standards using the Leiden algorithm. It was found that the GB 2 762, and GB 2 763 agricultural standards were in the same community belonging to the National Food Safety Standard, indicating that the agricultural field was attached the great importance to the pesticide and contaminant residues in agricultural products. Most GB 5009 series standards belonging to the same community were basically physical and chemical indicators for the agricultural products related to the health inspection methods, of which several indicators with the higher references were the total mercury and organic mercury, total arsenic and inorganic arsenic, total lead, and organic phosphorus pesticide residues. Most references of GB 14881 were the local standards, indicating that the preparation of local standards to guidelines was related to the raw material purchase, processing, packaging, and storage steps in the production process of agricultural products.