Relationship extraction in the field of food safety based on BERT and improved PCNN model
-
-
Abstract
Abstract: A knowledge graph (semantic network) has emerged to organize the real-world entities in a graph database for the relationship between them. Among them, relationship extraction has been one of the most important links in the automatic construction of knowledge graphs. However, there is no public dataset related to knowledge graphs in the food safety field at present. The existing models of relationship extraction are confined to the open standard data set, but most cannot extract the data in the specific domain. In this study, a professional data set was constructed for the relationship extraction in the food safety field using the Bidirectional Encoder Representations from Transformers (BERT) and the improved Piecewise Convolutional Neural Network (PCNN) model. The corpus was firstly collected to annotate the corresponding entities and related categories. At the same time, a relationship extraction model was proposed using BERT-PCNN-Attention-based Neural Networks (ATT)-Jieba for the field of food safety. The BERT pre-training model was selected to generate the input word vector. After that, the segmented maximum pooling layer of the PCNN model was utilized to capture the local information of sentences. An attention mechanism was added between the segmented maximum pooling layer and the classification layer, further to extract the high-level semantics. In addition, Jieba word segmentation was used to segment the Chinese corpus before the random mask segmentation of the BERT model. The segmented maximum pool layer of the PCNN model masked the word unit instead of characters when executing the Masked Language Model (MLM). As such, the semantic loss of sentences was reduced to achieve a more efficient relationship extraction, when inputting into the training model. The performance of the BERT-PCNN-ATT-Jieba model was compared with the classical CNN, PCNN model, as well as the CNN, PCNN, PCNN-ATT, and PCNN-Jieba models combined with BERT under the same data set and the consistent experimental parameters. Comparing the PCNN with the BERT-PCNN model, the precision, recall, and F1 value of BERT-PCNN were slightly improved, indicating that the vector generated by the BERT model can better obtain the semantic feature information of data. Comparing the BERT-PCNN-ATT and BERT-PCNN, the pooled high-level semantic features presented a higher weight value after adding the attention mechanism between the pooling layer and the classification layer, indicating that the attention mechanism can improve the performance of the model. The F1 value of BERT-PCNN-Jieba was better than that of BERT-PCNN because the influence of word length was weakened through sentence preprocessing in the training set for the field of food safety. The position and logical information between words were better analyzed by adding a word segmentation operation. Consequently, the BERT-PCNN-ATT-Jieba model presented the highest precision of 84.72%, recall of 81.78%, and F1 value of 83.22%, indicating that the better performance was achieved in the relationship extraction data set using the field of food safety. The finding can provide a strong reference for knowledge extraction in the cost-saving and automatic construction of knowledge graphs in the field of food safety. The improved model can also lay a foundation for the application of Knowledge Q&A, knowledge retrieval, data sharing, and intelligent supervision of food safety using knowledge graphs.
-
-