基于中文字邻接图的食品抽检公告实体及关系联合抽取

    Entity and relationship joint extraction model of food inspection announcement based on Chinese character adjacency graph

    • 摘要: 在对中文食品抽检公告进行实体与关系抽取时,常会出现包含大量食品名称与食品分类的长句,现有深度学习模型在进行实体关系抽取时,只将文本看作一串字词序列来编码,当句子较长且词间距离过大时,词间强依赖关系的学习效果会减弱。而这一问题在食品领域的实体关系抽取中少有被关注,所以该研究提出基于改进中文依存句法树与多特征融合的实体关系联合抽取模型(TAG-JE),该模型将词间具有的强依赖关系通过句法依存树建立关系图,再根据中文BERT编码的字处理模式,将关系图转化为字邻接图,再由图神经网络学习字邻接图的结构特征,最后将之与BERT提取的文本上下文特征融合,融合权重通过门网络结构自主调节,以获得公告文本的多特征融合特征表示。获得的融合特征将采用主流的联合抽取模型进行实体与关系的抽取,并在关系判断时使用强化学习训练的关系选择器来优化关系的嵌入信息,以提升联合抽取方法在关系判断上的准确率。为了验证TAG-JE的效果,将其与主流的深度学习模型在自建的非结构化食品抽检公告数据集上进行了抽取效果对比,结果证明TAG-JE的精确率、召回率与F1值分别达到90.86%,90.50%,90.68%,相对其他基线模型都有较大提升,证明了其在中文食品抽检文档中的知识挖掘能力。针对中文公共数据集的试验中,该模型相对GraphRel与CasRel这些经典联合抽取模型也取得较好的结果,证明TAG-JE也有较好的泛化效果。研究结果可为食品安全中文知识图谱的构建提供技术参考。

       

      Abstract: Food is the top priority for the people, and food safety has always been a key concern for the people's livelihood. Building a Knowledge graph for food safety and food events has attracted many scholars' attention in recent years. One of the key technologies to build a large structured knowledge graph is to extract triples from unstructured text. A triplet typically consists of a pair of entities and semantic relations between them. When extracting entities and relationships from Chinese food inspection announcements, many long sentences often appear in the text, which contain a large number of food names and food types. If we learn about their features by using the current deep learning models, the strong dependencies between words will be weakened by the long distance between words, this issue has received little attention in the extraction of entity and relationships in the food safety field. So to solve the problem of dependency relations between words in those sentences, this paper proposes a joint entity relation extraction model (TAG-JE) based on improved Chinese dependency syntax tree and multi feature fusion. TAG-JE improves the expression of Chinese dependency syntax tree that is adding a "CLS" tag before the natural sentence and establish dependency relations between the "CLS" tag and each word in the sentence. Then we use it to describe the syntactic structure of text, and converts the syntax tree into a character adjacency graph so as to match the syntax tree with the word processing mode of Chinese BERT encoding. Then we learn syntactic structure features from the encoded word adjacency graph through a graph neural network(GCN), and fuse them with the context features extracted by BERT. Context features and structural features can describe the linear and spatial representation of sentences. The weight of the fusion is automatically adjusted by a gate network. After the above steps, the fusion features of unstructured text will be obtained which can fully reflect the representation of the food inspection announcement. In the process of entity and relationship extraction, the joint extraction mode is adopted, because compared to separate extraction of entities and relationships, joint extraction reduces error accumulation and TAG-JE also adds a relation selector in relation analysis, relation selector is used to optimize the embedding features of relations, and the relation selector would be trained by reinforcement learning to improve the accuracy of relation extraction. To verify the effectiveness of TAG-JE, this paper compared its extraction performance with mainstream deep learning models such as BiLSTM+CRF, combined with IDCNN, BERT, and Attention to obtain three BiLSTM+CRF framework models. In addition to the classic models mentioned above, this paper also compares the TAG-JE model with mainstream models GraphRel and CasRel. The results showed that the accuracy, recall, and F1 values of TAG-JE reached 90.86%, 90.50%, and 90.68%, respectively, which showed a significant improvement compared to other models and demonstrated its knowledge mining ability in Chinese food inspection documents. TAG-JE provides technical support for the construction of food safety knowledge graph. In the experiments targeting Chinese public datasets, TAG-JE also achieved better results compared to GraphRel and CasRel, proving that TAG-JE also has good generalization performance. The research results provide reference for the construction of food safety knowledge graph.

       

    /

    返回文章
    返回