杨鹤, 于红, 孙哲涛, 刘巨升, 杨惠宁, 张思佳, 孙华, 姜鑫, 于英囡. 基于双重注意力机制的渔业标准实体关系抽取[J]. 农业工程学报, 2021, 37(14): 204-212. DOI: 10.11975/j.issn.1002-6819.2021.14.023
    引用本文: 杨鹤, 于红, 孙哲涛, 刘巨升, 杨惠宁, 张思佳, 孙华, 姜鑫, 于英囡. 基于双重注意力机制的渔业标准实体关系抽取[J]. 农业工程学报, 2021, 37(14): 204-212. DOI: 10.11975/j.issn.1002-6819.2021.14.023
    Yang He, Yu Hong, Sun Zhetao, Liu Jusheng, Yang Huining, Zhang Sijia, Sun Hua, Jiang Xin, Yu Yingnan. Fishery standard entity relation extraction using dual attention mechanism[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(14): 204-212. DOI: 10.11975/j.issn.1002-6819.2021.14.023
    Citation: Yang He, Yu Hong, Sun Zhetao, Liu Jusheng, Yang Huining, Zhang Sijia, Sun Hua, Jiang Xin, Yu Yingnan. Fishery standard entity relation extraction using dual attention mechanism[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(14): 204-212. DOI: 10.11975/j.issn.1002-6819.2021.14.023

    基于双重注意力机制的渔业标准实体关系抽取

    Fishery standard entity relation extraction using dual attention mechanism

    • 摘要: 针对渔业标准实体关系抽取任务中存在重叠关系导致的效果不好问题,提出了基于双重注意力机制的实体关系抽取方法。首先,提出了一种句式分类标注策略,以解决渔业标准文本中重叠关系难以标注的问题;其次,提出了结合双重注意力机制与BERT-BiLSTM-CRF(Bidirectional Encoder Representations from Transformers-Bi-directional Long Short-Term Memory-Conditional Random Field)的渔业标准实体关系抽取模型,分别利用字级别注意力机制和句子级别注意力机制优化权重分配、消除噪音,进而提高关系抽取的准确性;最后,为验证所提出方法的有效性设计了对比试验,结果表明,基于双重注意力机制的实体关系抽取方法在DLOU-FSI(Fishery Standard Interaction)数据集(36万字符)上准确率、召回率、F1值分别达到了92.67%、92.31%、92.49%。研究表明,该方法可有效解决渔业标准关系抽取任务中存在的重叠关系问题,提升了渔业标准实体关系抽取的整体效果,为构建渔业标准知识图谱提供参考。

       

      Abstract: Abstract: Entity relation extraction is a fundamental task to detect a list of triplets, including two entities and the semantic relations between them. An overlapping relationship has caused the low performance of standard entity relation extraction in fishery in recent years. In this study, a novel entity relation extraction was proposed in fishery using the dual attention mechanism. First, a sentence classification and labeling strategy were selected to solve the difficulty in labeling overlapping relations in fishery standard texts. Second, an extraction model of standard entity relationship was established using a combination of dual attention, and Bidirectional Encoder Representations from Transformers-Bi-directional Long Short-Term Memory-Conditional Random Field (BERT-BiLSTM-CRF). Five components were included from the bottom to top: BERT layer, BiLSTM layer, word attention mechanism, sentence attention, and CRF output layer. In the pre-trained BERT model, the dual-layer two-way conversion and decoding were used to automatically learn sentence features for the vector representation. In the BiLSTM model, the context feature of the target entity was learned from the BERT output. The word and sentence level attention was utilized to optimize the weight of target words and sentences in the paragraph, including the noise removal, and the accuracy of relationship extraction. The CRF decoder was used to represent the output of the attention layer in the form of sequence tags. Finally, a comparative experiment was designed to verify the effectiveness of the model. The results show that better overall performance was achieved in the entity relationship extraction using the dual attention on the fishery standard interaction (DLOU-FSI) dataset, where the accuracy, recall, and F1 value were 92.67%, 92.31%, and 92.49%, respectively. The overlapping relations were effectively solved in the extraction of fishery standard relations. In addition, the recognition accuracy rates in the seven relation categories were all higher than others. The accuracy rate, recall rate, and F1 value were greatly improved by more than 90% in the quotation relationship, regulation, release, proposal, drafting, and centralization. Nevertheless, the recall rate of the improved model was dropped slightly in the comparison test. The reason was that there was a sparse distribution of sentence samples in the related category of comparative relations. In the DLOU-FSI corpus, each standard text of fishery contained only 0-3 comparison relation triples, less than 1% of the total number of all relational triples. As such, the comprehensive relationship features were necessary to learn for the higher recognition in the relationship extraction task. In addition, the quality of corpus was determined on the extraction of fishery standard text relations. The higher quality and the more accurate recognition of the model were achieved, as the deep learning model was trained to learn more data. There was also a critical value, after which the model cannot be greatly improved. In the case of a model adaptation, the test data can be effectively expanded to increase the number and diversity of samples for a better overall effect of relationship extraction. This finding can lay a significant foundation for the construction of fishery standard knowledge graphs.

       

    /

    返回文章
    返回