Abstract
Two critical bottlenecks can be found in the existing expert systems of the rice pest and disease: 1) Low efficiency during knowledge graph (KG) construction can often lead to the entity omission and relationship misjudgment, due to the complex ontologies and overlapping semantics in agricultural texts; 2) In weak generalization in the question-answering (QA), especially for semantically ambiguous queries (e.g., symptom-based disease diagnosis), conventional retrieval-augmented generation (RAG) or natural language-to-Cypher (NL2Cypher) can fail to generate effective queries. In this study, an integrated framework was developed to combine with the Large Language Models (LLMs) and KGs, in order to enhance the fine-grained agricultural knowledge extraction, dynamic KG updates, and precise QA reasoning. Two models were proposed in automated KG construction. 1) Progressive Prompt Extraction (PPE) model was decomposed the entity attribute and triple extraction into three sequential polling stages: corpus polling, which structured unstructured text from the rice pest and disease knowledge base (RDP-KB) into indexable metadata to focus knowledge; entity attribute polling, which used entity attribute ontologies and explanation instructions from an ontology registration module to guide LLM-based attribute extraction; and triple polling, which reused previously extracted entity attributes to enhance relationship extraction. 2) The Incremental Entity Update (IEU) model was addressed on the multi-source data redundancy and conflicts, in order to classify the attributes into "primary attributes" (stable, high-credibility knowledge in the structured database) and "secondary attributes" (newly extracted data from PPE). It prioritized high-reliability sources (e.g., academic literature and professional books over web encyclopedias) and retained primary attributes in case of conflicts. Meanwhile, the AgenticGraphRAG architecture was designed in QA. The ReAct paradigm was adopted to decompose the complex user queries into subtasks. A dual-retrieval system was deployed: vector task retrieval and Cypher task retrieval, thus ensuring complementary and accurate knowledge recall. On the RDP-KB (227 562 characters, constructed via OCR of professional books, web crawling, and manual annotation of 910 entities and 1,602 triples), the framework successfully constructed a KG with 901 entities and 1 554 triples, thus consuming 3 084 874 Tokens and taking 3.38 h. Ablation experiments showed that when Qwen2.5-72B-Instruct was paired with PPE, its F1-scores for entity attribute and triple extraction reached 89.5% and 87.1%, respectively, indicating the increases of 11.6% and 10.5%, compared with the model without PPE. Even the smaller Qwen2.5-32B-Instruct with PPE also outperformed Qwen2.5-72B-Instruct without PPE: Its entity attribute F1-score was 85.1% (vs. 77.9% for Qwen2.5-72B-Instruct w/o PPE), and triple F1-score was 81.4% (vs. 76.6%). Comparative experiments against the baselines (NaiveRAG, GraphRAG, HippoRAG2, LightRAG) revealed that: on the RDP-QA-Symptom dataset (100 real farmer queries + 300 LLM-simulated queries), diagnostic accuracies were 86% (Human subset) and 89.33% (LLM subset), outperforming HippoRAG2 (73%, and 77.33%) and LightRAG (68%, and 70.67%); on the RDP-QA-Web dataset (400 comprehensive QA pairs from agricultural platforms), the system was achieved an LLM-Metric score of 90.32% (vs. 58.16% for NaiveRAG, 65.72% for GraphRAG, 79.56% for HippoRAG2, 82.08% for LightRAG), with notable improvements in accuracy (17.76 vs. 14.96 for LightRAG) and comprehensiveness (18.56 vs. 16.04 for LightRAG). The highest Customer Satisfaction Score (CSAT) of 88% was also achieved among all tests, as evaluated by two human experts. The framework effectively solved the key challenges in the expert systems of the rice pest and disease. The PPE and IEU significantly improved the KG construction efficiency, data consistency, and extraction accuracy, even enabling smaller LLMs to outperform larger ones without PPE. AgenticGraphRAG enhanced QA accuracy in the semantically ambiguous scenarios using dual retrieval. This finding can provide a reusable technical paradigm for the vertical-domain expert systems (e.g., adaptable to other agricultural crops) for future optimizations, such as multi-modal interactions (speech and image). The lower application thresholds can also fully meet the automated KG updates suitable for pest and disease knowledge, further advancing agricultural informatization.