Abstract:
Domain knowledge graph can store the data with structured and fine-grained features, and model the real world in the form of triple groups. Dispersed knowledge can be effectively organized and then widely used in the fields of healthcare, finance, and the Internet. Alternatively, the grape is one of the most important economic fruits in agriculture. However, there is a large amount of unstructured knowledge in the grape domain, limiting the downstream data-driven task use. Current knowledge graphs are also rare in the agricultural domain. It is very necessary to construct a knowledge graph in the grape domain, particularly for knowledge storage and sharing. Furthermore, the key information is often implicit in the complex contextual environment, when constructing domain knowledge graphs. The character vector semantic representations of existing named entity recognition (NER) models are relatively homogeneous, leading to a low recognition rate of domain-specialized entities, and ultimately affect the efficiency and quality of knowledge graph construction. In this study, a named entity recognition model was proposed using the fusion of Bi-directional Encoder Representation from Transformer (BERT) and Residual Structure (RS). Firstly, the raw text was mapped into the character vectors using BERT. The input sentences were then embedded in BERT using token, segment and position embedding. In the subsequent embedded vectors, a distinctive Multi-head Attention mechanism was utilized to calculate the correlation between the current character and other characters in the sentence. This calculation allowed for the adjustment of their weights, thereby endowing the character vectors provided by BERT with global characteristics. In the Bi-directional Long-Short Term Memory (BiLSTM), the character vectors provided by BERT were obtained from the deep-layered local features in both forward and backward directions. Two simple but effective residual structures were designed to optimize the global features provided by BERT and the deep local feature provided by BiLSTM. The mapping residual structure was used to map the feature vectors provided by the BERT in a reduced dimension, in order to preserve as much of the original information of the BERT as possible, while the convolution residual structure convolved the feature vectors twice to obtain more information. The feature vectors were decoded by a Conditional Random Field (CRF) model. Compared with other NER models, the proposed BBNER-MRS model performs better overall, with
F1 values of 89.89%, 95.02%, 83.21%, 96.15%, and 72.51% on the Grape, People Daily, BOSON, RESUME, and Weibo datasets, respectively. A two-stage deep learning-based domain knowledge graph construction was proposed, i.e., in the first stage, a domain ontology was constructed, and in the second stage, a deep learning model was utilized to extract knowledge under the constraints of the ontology and construct triple groups. The BBNER-MRS performed the best when constructing triple groups from unstructured text with an
F1 value of 86.44%. Finally, the BBNER-MRS was used to successfully construct a grape knowledge graph. This research can provide technical and data support to the standardization and sharing of domain data.