基于深度学习的作物病虫害可视化知识图谱构建

吴赛赛; 周爱莲; 谢能付; 梁晓贺; 汪汇涓; 李小雨; 陈桂鹏

doi:10.11975/j.issn.1002-6819.2020.24.021

基于深度学习的作物病虫害可视化知识图谱构建

Construction of visualization domain-specific knowledge graph of crop diseases and pests based on deep learning

摘要

摘要: 针对作物病虫害领域存在实体关系交叉关联、多源异构数据聚合能力差、知识共享困难等问题，利用知识图谱以结构化的形式描述实体间复杂关系的优势，该研究提出了一种基于深度学习的作物病虫害知识图谱构建方法。该方法在领域本体的基础上，以一种与领域语料相适应的新标注模式实现实体和关系的联合抽取。将实体和关系抽取任务转化为序列标注问题，对实体和关系进行同步标注，有效提高标注效率；为了解决重叠关系抽取问题，直接对三元组建模而不是分别对实体和关系建模，通过标签匹配和映射即可获得三元组数据。利用来自转换器的双向编码器表征量（Bidirectional Encoder Representations from Transformers，BERT）-双向长短期记忆网络（Bi-directional Long-Short Term Memory，BiLSTM）+条件随机场（Conditional Random Field，CRF）端到端模型进行试验，结果表明效果优于基于普通标注方式的流水线方法和联合学习方法中的卷积神经网络（Convolutional Neural Networks，CNN）+BiLSTM+CRF、BiLSTM+CRF等经典模型，F1得分为91.34%。最后，将抽取到的知识存储到Neo4j图数据库中，直观地反映知识图谱的内部结构，实现知识可视化和知识推理。该研究构建的知识图谱可为作物病虫害智能问答系统、推荐系统、智能搜索等下游应用提供高质量的知识库基础。

Abstract: Abstract: The knowledge graph describes the concepts, entities, and their relationships in the objective world in a structured form. It has a better ability to organize, manage, and understand massive amounts of information, and can structure heterogeneous knowledge in the field. It can be widely used in medical, biological, financial, etc. In view of the current situation in the field of crop diseases and insect pests, there are multiple relationship pairs between the same entity and multiple entities, multi-source heterogeneous data, poor aggregation ability, low utilization, and the possibility of knowledge sharing. Combining Natural Language Processing (NLP) and text mining technologies, this study focused on data acquisition, ontology construction, knowledge extraction, and knowledge storage, researched on the construction of crops diseases and insect pests knowledge graph based on deep learning. Firstly, this study used the Scrapy crawler framework of the Python programming language to crawl data from web pages related to crop diseases and insect pests, and performed data cleaning and data supplementation through data preprocessing methods. Secondly, according to the characteristics of the domain corpus, the Protégé ontology construction tool was used to complete the semi-automatic construction of the crop diseases and insect pests ontology predefined the set of properties and relations and set the corresponding domains and ranges. Then, based on the ontology, the rule method was used to extract semi-structured knowledge, and the deep learning method was used to extract unstructured knowledge. In the process of unstructured knowledge extraction, a text annotation mode "Main_Entity+Relation+BIESO" (ME+R+BIESO) adapted to the domain corpus was also proposed. Based on a predefined set of relationships, entities and relationships were simultaneously annotated, it contained entity and relationship information at the same time, and directly modeling the triples instead of separately modeling entities and relationships. The corresponding triples were also directly obtained through analysis, which not only saved at least half of the cost of labeling but also realized the joint extraction of entity relations and solved the problem of overlapping relation extraction. And this study used the Bidirectional Encoder Representation from Transformers (BERT)- Bi-directional Long-Short Term Memory (BiLSTM)+ Conditional Random Field (CRF) end-to-end model to experiment on the crop diseases and insect pests dataset. First, this study used the BERT pre-training language model to encode words, extracted text features, and used the generated vector as the input of the BiLSTM layer; BiLSTM integrated contextual information into the model at the same time, and performed bidirectional encoding to achieve effective prediction of label sequences; finally, this study used the CRF module to decode the output result of BiLSTM, and the label transition probability and constraint conditions were obtained through training and learning, and the entity label category of each character was obtained. The experimental results showed that the precision was 94.06%, the recall was 89.02%, and the F1 value reached 91.34%, which was much better than the pipeline method and classic models such as BiLSTM+CRF and Convolutional Neural Networks (CNN)+BiLSTM+CRF in the joint extraction method. The joint extraction of entity relations based on this annotation mode not only improved the efficiency and accuracy of annotation but also solved the problem of overlapping relations in the corpus. Finally, the extracted knowledge was stored in the graph database to realize the visual display of the knowledge graph and deep knowledge mining and reasoning. Combined the deep learning technology to realize the semi-automatic construction of the knowledge graph, which was of great significance for the detection of crop diseases and insect pests, forecasting and early warning, and the establishment of prevention models in the intelligent production system. It could provide a high-quality knowledge base for crop diseases and insect pests question answering systems, recommendation systems, search engines, and other applications, which could be effectively applied to crop variety selection, pest prevention and control, and fertilization and irrigation.

HTML全文

参考文献(35)

施引文献

资源附件(0)