Abstract:
Food is the top priority for the people, and food safety has always been a key concern for the people's livelihood. Building a Knowledge graph for food safety and food events has attracted many scholars' attention in recent years. One of the key technologies to build a large structured knowledge graph is to extract triples from unstructured text. A triplet typically consists of a pair of entities and semantic relations between them. When extracting entities and relationships from Chinese food inspection announcements, many long sentences often appear in the text, which contain a large number of food names and food types. If we learn about their features by using the current deep learning models, the strong dependencies between words will be weakened by the long distance between words, this issue has received little attention in the extraction of entity and relationships in the food safety field. So to solve the problem of dependency relations between words in those sentences, this paper proposes a joint entity relation extraction model (TAG-JE) based on improved Chinese dependency syntax tree and multi feature fusion. TAG-JE improves the expression of Chinese dependency syntax tree that is adding a "CLS" tag before the natural sentence and establish dependency relations between the "CLS" tag and each word in the sentence. Then we use it to describe the syntactic structure of text, and converts the syntax tree into a character adjacency graph so as to match the syntax tree with the word processing mode of Chinese BERT encoding. Then we learn syntactic structure features from the encoded word adjacency graph through a graph neural network(GCN), and fuse them with the context features extracted by BERT. Context features and structural features can describe the linear and spatial representation of sentences. The weight of the fusion is automatically adjusted by a gate network. After the above steps, the fusion features of unstructured text will be obtained which can fully reflect the representation of the food inspection announcement. In the process of entity and relationship extraction, the joint extraction mode is adopted, because compared to separate extraction of entities and relationships, joint extraction reduces error accumulation and TAG-JE also adds a relation selector in relation analysis, relation selector is used to optimize the embedding features of relations, and the relation selector would be trained by reinforcement learning to improve the accuracy of relation extraction. To verify the effectiveness of TAG-JE, this paper compared its extraction performance with mainstream deep learning models such as BiLSTM+CRF, combined with IDCNN, BERT, and Attention to obtain three BiLSTM+CRF framework models. In addition to the classic models mentioned above, this paper also compares the TAG-JE model with mainstream models GraphRel and CasRel. The results showed that the accuracy, recall, and F1 values of TAG-JE reached 90.86%, 90.50%, and 90.68%, respectively, which showed a significant improvement compared to other models and demonstrated its knowledge mining ability in Chinese food inspection documents. TAG-JE provides technical support for the construction of food safety knowledge graph. In the experiments targeting Chinese public datasets, TAG-JE also achieved better results compared to GraphRel and CasRel, proving that TAG-JE also has good generalization performance. The research results provide reference for the construction of food safety knowledge graph.