多模态知识图谱增强葡萄种植问答对的答案选择模型

杨硕; 李书琴

doi:10.11975/j.issn.1002-6819.202304240

多模态知识图谱增强葡萄种植问答对的答案选择模型

杨硕,
李书琴

Enhancing answer selection model of grape planting using multimodal knowledge graph

摘要

摘要: 针对传统答案选择模型仅依靠问答对自身信息进行匹配的问题，该研究提出了一种使用多模态知识图谱来增强问答对的答案选择模型。该模型通过设计基于ComplEx（complex embedding）图谱嵌入的方法学习多模态知识图谱嵌入，引入上下文注意力机制并使用CNN网络获取多模态知识图谱的特征表示，使用知识感知注意力方法，将多模态知识图谱提供的背景知识与问答对的文本语义信息融合。以葡萄种植为例，通过搭建葡萄种植多模态知识图谱和构造葡萄种植问答数据集开展试验，试验结果表明: 使用多模态知识图谱有助于模型获取更多信息从而达到更好的效果，在葡萄问答数据集中正确答案的平均倒数排名和平均准确率分别达到了85.02%、84.21%，与其他模型相比，平均倒数排名提高2.57个百分点，平均准确率提高了3.96个百分点。该答案选择模型利用多模态知识图谱的知识提高答案选择效果，可为搜索、问答等下游任务提供技术基础。

Abstract: Answer selection is one of the most important tasks during natural language processing in the downstream tasks, such as question-answering systems, and search ranking. The most relevant answer can be selected to the given question from a candidate answer pool, which is usually regarded as a relevance ranking task. However, the current models of answer selection cannot discover the deep semantic relationships between questions and answers using the limited information in the text of the question-answer pairs. Fortunately, knowledge graph can be expected to serve as the background knowledge, in order to enhance the deep semantics of the answer selection model. It is still lacking on the multi-modal background knowledge support, because the answer selection models can rely solely on their own information. In this research, a multi-modal knowledge graph enhanced answer selection model was proposed, including the embedding layer, representation learning layer, knowledge graph enhancement layer, and output layer. Among them, the Glove model was used to obtain the word embeddings for the question-answer texts in the embedding layer. Furthermore, a ComplEx-based method (complex embedding) was designed to learn the entity embeddings for the multi-modal knowledge graph. The image entity information was considered to extract the image feature representations using the Vision Transformer (VIT). Bi-directional long short-term memory (Bi-LSTM) was used for the representation learning of question-answer texts in the representation layer. The context-guided multi-modal knowledge graph question and answer vector representations were obtained using context-guided attention mechanism. In the knowledge graph enhancement layer, the interaction attention mechanism was used to fuse the semantic representation of the question-answer texts with the background knowledge features that provided by the multi-modal knowledge graph, particularly for the feature representations of the multi-modal knowledge graph enhanced question and answer. The feature representations of the knowledge graph enhanced question and answer were concatenated with the additional semantic features in the output layer. The softmax function was used to predict the probability distribution of answer labels for a given question. Taking the grape planting as an example, the multi-modal entity linking was realized using the longest common subsequence algorithm. The entity recognition was also implemented to extract the knowledge using the Bert-LSTM-CRF framework and Bert pre-training model. The reference of knowledge graph was collected from the literature and experts. Finally, a multi-modal knowledge graph was constructed in the grape planting field. A grape planting question and answer dataset was also constructed using grape forums, smart agricultural platforms, agricultural managers, and agricultural benefit networks as data sources, followed by text cleaning and dataset expansion. Experimental results show that the better performance of the model was achieved to obtain more information using the multi-modal knowledge graphs. Specifically, the mean reciprocal rank and mean average precision reached 85.02% and 84.21%, respectively, in the grape question answering dataset. The mean reciprocal rank and mean average precision increased by 2.57 and 3.96 percentage points, respectively. The answer selection model with the knowledge of multi-modal knowledge graph can be expected to improve the better performance of answer selection model. The embedding representation with attention mechanism can be utilized to enhance the background knowledge from the multi-modal knowledge graph. The finding can provide a technical basis for the downstream applications of multi-modal knowledge graphs, such as the search and question answering.

HTML全文

参考文献(34)

施引文献

资源附件(0)