Abstract:
Abstract: The knowledge graph describes the concepts, entities, and their relationships in the objective world in a structured form. It has a better ability to organize, manage, and understand massive amounts of information, and can structure heterogeneous knowledge in the field. It can be widely used in medical, biological, financial, etc. In view of the current situation in the field of crop diseases and insect pests, there are multiple relationship pairs between the same entity and multiple entities, multi-source heterogeneous data, poor aggregation ability, low utilization, and the possibility of knowledge sharing. Combining Natural Language Processing (NLP) and text mining technologies, this study focused on data acquisition, ontology construction, knowledge extraction, and knowledge storage, researched on the construction of crops diseases and insect pests knowledge graph based on deep learning. Firstly, this study used the Scrapy crawler framework of the Python programming language to crawl data from web pages related to crop diseases and insect pests, and performed data cleaning and data supplementation through data preprocessing methods. Secondly, according to the characteristics of the domain corpus, the Protégé ontology construction tool was used to complete the semi-automatic construction of the crop diseases and insect pests ontology predefined the set of properties and relations and set the corresponding domains and ranges. Then, based on the ontology, the rule method was used to extract semi-structured knowledge, and the deep learning method was used to extract unstructured knowledge. In the process of unstructured knowledge extraction, a text annotation mode "Main_Entity+Relation+BIESO" (ME+R+BIESO) adapted to the domain corpus was also proposed. Based on a predefined set of relationships, entities and relationships were simultaneously annotated, it contained entity and relationship information at the same time, and directly modeling the triples instead of separately modeling entities and relationships. The corresponding triples were also directly obtained through analysis, which not only saved at least half of the cost of labeling but also realized the joint extraction of entity relations and solved the problem of overlapping relation extraction. And this study used the Bidirectional Encoder Representation from Transformers (BERT)- Bi-directional Long-Short Term Memory (BiLSTM)+ Conditional Random Field (CRF) end-to-end model to experiment on the crop diseases and insect pests dataset. First, this study used the BERT pre-training language model to encode words, extracted text features, and used the generated vector as the input of the BiLSTM layer; BiLSTM integrated contextual information into the model at the same time, and performed bidirectional encoding to achieve effective prediction of label sequences; finally, this study used the CRF module to decode the output result of BiLSTM, and the label transition probability and constraint conditions were obtained through training and learning, and the entity label category of each character was obtained. The experimental results showed that the precision was 94.06%, the recall was 89.02%, and the F1 value reached 91.34%, which was much better than the pipeline method and classic models such as BiLSTM+CRF and Convolutional Neural Networks (CNN)+BiLSTM+CRF in the joint extraction method. The joint extraction of entity relations based on this annotation mode not only improved the efficiency and accuracy of annotation but also solved the problem of overlapping relations in the corpus. Finally, the extracted knowledge was stored in the graph database to realize the visual display of the knowledge graph and deep knowledge mining and reasoning. Combined the deep learning technology to realize the semi-automatic construction of the knowledge graph, which was of great significance for the detection of crop diseases and insect pests, forecasting and early warning, and the establishment of prevention models in the intelligent production system. It could provide a high-quality knowledge base for crop diseases and insect pests question answering systems, recommendation systems, search engines, and other applications, which could be effectively applied to crop variety selection, pest prevention and control, and fertilization and irrigation.