Abstract:
Various aquatic diseases occur frequently in recent years, particularly with the rapid development of the aquaculture industry. Once these diseases spread very rapidly, a very serious risk can be put on aquaculture. Furthermore, the network data of aquatic diseases also presents highly dispersed, multi-source heterogeneous features with the development of Internet technology. It is a high demand to rapidly and accurately obtain the required information, due to the explosive growth of network data. However, traditional information acquisition cannot fully meet the search engines. The retrieval keywords or shallow semantic analysis can bring a large number of related web links, leading to vague and redundant answers. The intelligent Q&A system can be selected to support the users' natural language input, and then accurately capture the user intent, finally returning concise and accurate answers. Among them, the emergence and rapid development of knowledge graphs can provide a high-quality knowledge base for intelligent question-answering systems, in order to promote the application of question-answering systems in various fields. The knowledge graph construction can be divided mainly into four steps: data acquisition, ontology construction, knowledge extraction and storage. Firstly, the crawler technology is used to obtain the relevant aquatic disease data, and then data preprocessing can be performed, including data cleaning and analysis. Secondly, the aquatic diseases ontology can be constructed using the data content and representation characteristics, in order to predefine the relations and properties types between entities. As such, the boundaries of knowledge extraction are clarified during this time. Secondly, the rule logic can be used to extract the semi-structured data. The entity and relation joint extraction is then used to extract unstructured data. Finally, the extracted triple data is stored in the Neo4j graph database, in order to realize the visual management of the knowledge graph and a certain degree of knowledge reasoning. In this study, a new text annotation system of "aquatic disease + relationship +BMES" was proposed in the unstructured knowledge. The relationship extraction was also integrated into the named entity recognition task. Then, the ternary model was directly constructed to transform the entity relationship extraction into sequence annotation, in order to improve the annotation efficiency at least twice for the joint extraction of entity and relationship. At the same time, the triplet data was obtained for the triplet-building module using label matching and mapping. The overlapping relation extraction was then solved in this case. The BERT-BiLSTM+CRF end-to-end model was used to carry out the test. The test results showed that the triad extraction shared a high recall rate (89.64%), accuracy (94.04%), and
F1 (91.34%), which was significantly better than CNN+BiLSTM+CRF and BiLSTM+CRF models. The extracted knowledge was stored in the Neo4j graph database, and then realized knowledge visualization management and knowledge reasoning analysis. The aquatic disease knowledge map presented high precision and fine granularity. The finding can provide a new idea for the field of intelligent Q&A. Anyway, the semi-automatic construction of a knowledge graph can also offer technical support for the recommendation system, knowledge base construction, search and application knowledge base construction.