融合关系上下文与路径的茶叶种植知识图谱关系补全模型

单源源; 李书琴

doi:10.11975/j.issn.1002-6819.202403017

融合关系上下文与路径的茶叶种植知识图谱关系补全模型

Incorporating relational contexts and paths for tea-planting knowledge graph completion

摘要

摘要: 针对茶叶种植知识图谱不完备问题，该研究提出了一种融合关系上下文与路径的茶叶种植知识图谱关系补全模型。模型由TeaConAggr（tea context aggregate）聚合模块、多层关系消息传递机制、关系路径聚合模块和关系路径学习模块组成，首先利用TeaConAggr模块对实体的各跳关系上下文进行聚合，通过多层关系消息传递机制将各跳关系上下文进行汇总，从而得到实体对的关系上下文；其次使用关系路径聚合模块对实体之间的关系路径进行聚合，并通过关系路径学习模块对路径进行学习；最后通过注意力机制对关系上下文和关系路径进行融合，以实现实体关系补全的目标。模型在自建的茶叶种植知识图谱数据集TPKGData上开展试验，试验结果表明：该模型在平均倒数排名、命中率Hits@1和命中率Hits@3三个指标上分别达到了85.40%、80.95%和90.08%，与Shallom模型相比分别提高了2.56%、2.63%和4.25%个百分点。此外，模型在公开数据集FB15K-237和WN18RR上与Shallom模型进行试验对比，平均倒数排名分别提高了2.17%和1.05个百分点，进一步表明本文模型具有较好的泛化能力。

Abstract: Tea planting has developed rapidly in China in recent years. It is of ever-increasing importance to integrate massive and complex knowledge in a structured manner, particularly with the development of the internet. The diverse and scattered sources of tea planting have resulted in fragmented knowledge. Knowledge graphs can be expected to greatly enhance the manageability and accessibility of data. However, it is also still lacking in the complete knowledge graphs of tea planting. It is of great significance to promote the informatization and modernization of the entire tea industry. In this study, a systematic and structured knowledge graph of tea planting was constructed on the orderly knowledge resources for the tea planting field. A tea-planting knowledge graph completion (TPKGC) model was integrated into the relationship context and path for the missing relationships between entities in the tea-planting knowledge graph. The improved model consisted mainly of three parts: a relationship context layer, a relationship path layer, and a fusion output layer. In the relationship context layer, the TeaConAggr (tea context aggregate) module was first used to aggregate the hop relationship contexts of entities. After that, the hop relationship contexts were summarized to obtain the relational context of entity pairs using a multi-layer relational messaging mechanism. In the relationship path layer, the relationship path aggregation module was used to aggregate the relationship paths among entities. The relationship path learning module was used to learn the paths. In the fusion output layer, multiplication attention and Softmax function were used to fuse the relationship context representation from the relationship context layer with the relationship path feature from the relationship path layer, in order to obtain the weight of each path. Subsequently, the weight was calculated with the relationship path feature to obtain the aggregated representation of the relationship path. Finally, additive attention was used to process the relationship context representation. The relationship path that aggregated representation was obtained for the predicted ranking of relationships between entity pairs, in order to achieve the goal of knowledge graph relationship completion. Data collection was constructed a knowledge graph in the field of tea planting. There were two major types of resources: books and the internet. The dataset was defined as 16 entity and 16 relationship types to represent various entities and their complex relationships in the tea planting knowledge graph. The TPKGData dataset was also developed for the tea-planting knowledge graph. The experimental results showed that the performance of the improved model was achieved in 85.40%, 80.95%, and 90.08% on the Mean Reciprocal Rank, Hits@1, and Hits@3 metrics, respectively, which were 2.56, 2.63, and 4.25 percentage points higher than those of Shallom model. The effectiveness of the model was fully met to predict the missing relationships between entities in the tea-planting knowledge graph. In addition, the comparative experiments were also conducted on the publicly available datasets FB15K-237 and WN18RR, in order to further verify the generalization of the model. At the same time, the TPKGC model also achieved the best performance on these datasets. Therefore, the model was performed outstandingly on the specific domain datasets, indicating better generalization. The stable performance was achieved in the different datasets of the knowledge graph. The TPKGC model demonstrated excellent performance and generalization potential in the task of relationship completion within the tea-planting knowledge graph. The finding can also provide important guidance for the construction of knowledge graphs.

HTML全文

参考文献(39)

施引文献

资源附件(0)