基于深度学习的水产病害可视化知识图谱构建与验证

姜丽华; 赵瑞雪; 董春岩; 常晓燕; 马娟娟; 谢能付; 方松

doi:10.11975/j.issn.1002-6819.202304111

基于深度学习的水产病害可视化知识图谱构建与验证

Construction and verification of the visual knowledge map of aquatic diseases based on deep learning

摘要

摘要: 知识图谱本质上是基于图的语义网络，表示实体与实体之间的关系，在知识问答、语义检索等领域起着至关重要的作用。针对目前水产病害领域存在实体关系交叉关联、多源异构数据聚合能力差、利用率低、知识共享困难等问题，该研究基于自然语言处理和文本挖掘提出了一个基于神经网络深度学习模型的水产病害专业领域知识图谱构建方法并进行试验验证。首先，构建水产病害专业领域本体，并预定义实体类型、属性和关系的集合，确定知识抽取边界；其次，在本体基础上，分别利用规则方法和深度学习方法对半结构化和非结构化知识进行抽取。对于非结构化知识，提出“水产病害+关系+BMES”文本标注体系，将关系抽取融合于命名实体识别任务中直接对三元组建模，将实体关系抽取转化为序列标注问题，不仅提高标注效率，还实现了实体和关系的联合抽取。同时通过标签匹配和映射对三元组建模获得RDF数据，解决了重叠关系抽取的难题。利用BERT-BiLSTM+CRF端到端模型进行试验，试验结果证明该三元组抽取方法具有较高的召回率（89.64%），准确率（94.04%）和F₁值（91.34%），优于CNN+BiLSTM+CRF和BiLSTM+CRF等模型，抽取效果有了显著提升，并将抽取到的知识存储到 Neo4j 图数据库中，实现知识可视化管理及知识推理分析。该研究构建的水产病害知识图谱精度高、粒度细，能够帮助机器理解数据、解释现象、知识推理，从而发掘深层关系、实现智慧搜索与智能交互。

Abstract: Various aquatic diseases occur frequently in recent years, particularly with the rapid development of the aquaculture industry. Once these diseases spread very rapidly, a very serious risk can be put on aquaculture. Furthermore, the network data of aquatic diseases also presents highly dispersed, multi-source heterogeneous features with the development of Internet technology. It is a high demand to rapidly and accurately obtain the required information, due to the explosive growth of network data. However, traditional information acquisition cannot fully meet the search engines. The retrieval keywords or shallow semantic analysis can bring a large number of related web links, leading to vague and redundant answers. The intelligent Q&A system can be selected to support the users' natural language input, and then accurately capture the user intent, finally returning concise and accurate answers. Among them, the emergence and rapid development of knowledge graphs can provide a high-quality knowledge base for intelligent question-answering systems, in order to promote the application of question-answering systems in various fields. The knowledge graph construction can be divided mainly into four steps: data acquisition, ontology construction, knowledge extraction and storage. Firstly, the crawler technology is used to obtain the relevant aquatic disease data, and then data preprocessing can be performed, including data cleaning and analysis. Secondly, the aquatic diseases ontology can be constructed using the data content and representation characteristics, in order to predefine the relations and properties types between entities. As such, the boundaries of knowledge extraction are clarified during this time. Secondly, the rule logic can be used to extract the semi-structured data. The entity and relation joint extraction is then used to extract unstructured data. Finally, the extracted triple data is stored in the Neo4j graph database, in order to realize the visual management of the knowledge graph and a certain degree of knowledge reasoning. In this study, a new text annotation system of "aquatic disease + relationship +BMES" was proposed in the unstructured knowledge. The relationship extraction was also integrated into the named entity recognition task. Then, the ternary model was directly constructed to transform the entity relationship extraction into sequence annotation, in order to improve the annotation efficiency at least twice for the joint extraction of entity and relationship. At the same time, the triplet data was obtained for the triplet-building module using label matching and mapping. The overlapping relation extraction was then solved in this case. The BERT-BiLSTM+CRF end-to-end model was used to carry out the test. The test results showed that the triad extraction shared a high recall rate (89.64%), accuracy (94.04%), and F₁ (91.34%), which was significantly better than CNN+BiLSTM+CRF and BiLSTM+CRF models. The extracted knowledge was stored in the Neo4j graph database, and then realized knowledge visualization management and knowledge reasoning analysis. The aquatic disease knowledge map presented high precision and fine granularity. The finding can provide a new idea for the field of intelligent Q&A. Anyway, the semi-automatic construction of a knowledge graph can also offer technical support for the recommendation system, knowledge base construction, search and application knowledge base construction.

HTML全文

参考文献(34)

施引文献

资源附件(0)