吕东东, 陈俊华, 毛典辉, 张青川, 赵敏, 郝治昊. 农产品标准领域知识图谱实体关系抽取及关联性分析[J]. 农业工程学报, 2022, 38(9): 315-323. DOI: 10.11975/j.issn.1002-6819.2022.09.035
    引用本文: 吕东东, 陈俊华, 毛典辉, 张青川, 赵敏, 郝治昊. 农产品标准领域知识图谱实体关系抽取及关联性分析[J]. 农业工程学报, 2022, 38(9): 315-323. DOI: 10.11975/j.issn.1002-6819.2022.09.035
    Lyu Dongdong, Chen Junhua, Mao Dianhui, Zhang Qingchuan, Zhao Min, Hao Zhihao. Entity relationship extraction and correlation analysis of agricultural product standard domain knowledge graph[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(9): 315-323. DOI: 10.11975/j.issn.1002-6819.2022.09.035
    Citation: Lyu Dongdong, Chen Junhua, Mao Dianhui, Zhang Qingchuan, Zhao Min, Hao Zhihao. Entity relationship extraction and correlation analysis of agricultural product standard domain knowledge graph[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(9): 315-323. DOI: 10.11975/j.issn.1002-6819.2022.09.035

    农产品标准领域知识图谱实体关系抽取及关联性分析

    Entity relationship extraction and correlation analysis of agricultural product standard domain knowledge graph

    • 摘要: 农产品标准不仅是衡量农产品安全的尺度,也是农产品安全监管的重要依据,当前农产品标准信息并没有得到系统性的关联划分与复用。针对此问题,该研究依据标准化文件的起草规范设计了农产品标准信息本体规则,在现有的农产品标准文件及相关词条数据基础上,为半结构化数据设计了正则包装器;为非结构化文本提出了一个基于依存句法分析的农产品领域开放关系抽取模型(Open Relation Extraction Model In Agricultural Products Field, OREM-AF),实现了该领域知识的自动抽取。结果表明该研究设计的包装器在提取半结构化数据信息时,准确率与F1值均在95%以上;提出的OREM-AF模型在农产品语料上准确率达74.22%、F1值为75.12%,在通用语料上准确率达84.51%、F1值为75.43%,抽取结果均好于基于依存句法分析的其他模型。依托抽取数据构建了农产品标准领域知识图谱,并在知识图谱的相互关联网络上进行了标准社区挖掘,挖掘出的关联标准知识能够为农产品标准监管提供辅助分析支撑。

       

      Abstract: Agricultural product standards can be used to support agricultural product safety and supervision in recent years. Nevertheless, the related terms of agricultural product standards are too decentralized and isolated from each other without any systematic correlation and reuse at present. Knowledge graphs can connect the various types of information together to form a network, thus analyzing from a "relational" perspective. This study aims to design the ontology rules for the agricultural standard information using the drafting specifications of standardized documents and relevant Baidu encyclopedia entry data. A suitable regular wrapper was also designed for the semi-structured data. Better performance was achieved to extract the standard document information, with the accuracy and F1 indexes above 95%. At the same time, an open relationship extraction model was established in the agricultural products field (OREM-AF) for the unstructured data using dependency parsing. This model was used to first learn the dependency structure between entity pairs for the triple labels of the training corpus, and further generate the entity relationship extraction paradigm logical expressions. After all the training corpus was learned, the test corpus was analyzed by the dependency syntax to obtain the core vocabulary chain of the corpus. Then, the substructure tree with the core vocabulary was taken as the root node for the corresponding entity pairs and relationships by matching the learned entity relationship dependency structure paradigm set for the corresponding triple. Finally, the automatic extraction of agricultural products was realized the related information triple. The experimental results show that the OREM-AF presented a 74.22% accuracy and 75.12% F1 value on the agricultural product data set, while the 84.51% accuracy and 75.43% F1 value on the common data set. The extraction performed better using dependency parsing, due to the active learning and fine-grained sibling substitution, compared with the other models. It infers that the active learning capability led to the strong migration. Relying on the neo4j graph database storage, a knowledge map was constructed in the field of agricultural standards, which clearly and quickly captured the links to information that needs to be retrieved, thus providing supplementary analytical support for the regulation of agricultural products. The community mining was carried out in the network of agricultural standards using the Leiden algorithm. It was found that the GB 2 762, and GB 2 763 agricultural standards were in the same community belonging to the National Food Safety Standard, indicating that the agricultural field was attached the great importance to the pesticide and contaminant residues in agricultural products. Most GB 5009 series standards belonging to the same community were basically physical and chemical indicators for the agricultural products related to the health inspection methods, of which several indicators with the higher references were the total mercury and organic mercury, total arsenic and inorganic arsenic, total lead, and organic phosphorus pesticide residues. Most references of GB 14881 were the local standards, indicating that the preparation of local standards to guidelines was related to the raw material purchase, processing, packaging, and storage steps in the production process of agricultural products.

       

    /

    返回文章
    返回