Abstract:
Abstract: Entity relation extraction is a fundamental task to detect a list of triplets, including two entities and the semantic relations between them. An overlapping relationship has caused the low performance of standard entity relation extraction in fishery in recent years. In this study, a novel entity relation extraction was proposed in fishery using the dual attention mechanism. First, a sentence classification and labeling strategy were selected to solve the difficulty in labeling overlapping relations in fishery standard texts. Second, an extraction model of standard entity relationship was established using a combination of dual attention, and Bidirectional Encoder Representations from Transformers-Bi-directional Long Short-Term Memory-Conditional Random Field (BERT-BiLSTM-CRF). Five components were included from the bottom to top: BERT layer, BiLSTM layer, word attention mechanism, sentence attention, and CRF output layer. In the pre-trained BERT model, the dual-layer two-way conversion and decoding were used to automatically learn sentence features for the vector representation. In the BiLSTM model, the context feature of the target entity was learned from the BERT output. The word and sentence level attention was utilized to optimize the weight of target words and sentences in the paragraph, including the noise removal, and the accuracy of relationship extraction. The CRF decoder was used to represent the output of the attention layer in the form of sequence tags. Finally, a comparative experiment was designed to verify the effectiveness of the model. The results show that better overall performance was achieved in the entity relationship extraction using the dual attention on the fishery standard interaction (DLOU-FSI) dataset, where the accuracy, recall, and F1 value were 92.67%, 92.31%, and 92.49%, respectively. The overlapping relations were effectively solved in the extraction of fishery standard relations. In addition, the recognition accuracy rates in the seven relation categories were all higher than others. The accuracy rate, recall rate, and F1 value were greatly improved by more than 90% in the quotation relationship, regulation, release, proposal, drafting, and centralization. Nevertheless, the recall rate of the improved model was dropped slightly in the comparison test. The reason was that there was a sparse distribution of sentence samples in the related category of comparative relations. In the DLOU-FSI corpus, each standard text of fishery contained only 0-3 comparison relation triples, less than 1% of the total number of all relational triples. As such, the comprehensive relationship features were necessary to learn for the higher recognition in the relationship extraction task. In addition, the quality of corpus was determined on the extraction of fishery standard text relations. The higher quality and the more accurate recognition of the model were achieved, as the deep learning model was trained to learn more data. There was also a critical value, after which the model cannot be greatly improved. In the case of a model adaptation, the test data can be effectively expanded to increase the number and diversity of samples for a better overall effect of relationship extraction. This finding can lay a significant foundation for the construction of fishery standard knowledge graphs.