面向知识图谱构建的水产动物疾病诊治命名实体识别

    Named-entity recognition for the diagnosis and treatment of aquatic animal diseases using knowledge graph construction

    • 摘要: 疾病诊治是水产动物健康养殖工程的重要支撑,知识图谱是水产动物疾病诊治知识表示及应用的有效手段,命名实体识别是构建水产动物疾病诊治知识图谱的关键。针对一词多义、实体嵌套等导致的水产动物疾病诊治命名实体识别准确率不高的问题,该研究提出了融合BERT(Bidirectional Encoder Representations from Transformers)与CaBiLSTM (Cascade Bi-directional Long Short-Term Memory)的实体识别模型。首先,建立水产动物疾病诊治专用语料库,并利用语料库中的数据对设计的模型进行训练;其次,采用"分层思想"设计CaBiLSTM模型进行嵌套实体识别,用降维的内层实体特征提升外层实体的辨析度,并引入BERT模型增添实体位置信息;最后,为验证所提出方法的有效性进行对比试验。试验结果表明,提出的融合BERT与CaBiLSTM模型对水产动物疾病诊治命名实体识别准确率、召回率、F1值分别达到93.07%、92.85%、92.96%。研究表明,该模型能够有效解决水产动物疾病诊治命名实体识别过程中由于一词多义、实体嵌套等导致的识别准确率不高问题,可提高水产动物疾病诊治知识图谱的构建质量,促进水产健康养殖工程发展。

       

      Abstract: Disease diagnosis and treatment have been an important support for aquatic animal health in aquaculture. A knowledge graph can be an effective way to express and apply the knowledge on the aquatic animal disease diagnosis and treatment. Among them, the named entity recognition has been the key component to construct the knowledge graph of aquatic animal diseases, particularly on the polysemy and entity nesting. However, the low recognition accuracy of named entities has posed a great challenge to the diagnosis and treatment of aquatic animal diseases. In this study, a diagnosis and treatment of aquatic animal diseases named entity recognition was proposed using BERT+CaBiLSTM+CRF (Bidirectional Encoder Representations from Transformers+Cascade-Bi-directional Long Short-Term Memory+Conditional Random Field). Firstly, the feature of the BERT model contained the position vector information. The polysemy was effectively improved to distinguish the different meanings that were expressed by entities in different contexts. Secondly, the CaBiLSTM model was designed for the nested named entity recognition using "hierarchical thinking". The reason was that the inner entity in the nested entity of aquatic medicine greatly contributed to the recognition of the outer entity. First of all, the BiLSTM+CRF model was used to identify the inner entities that appeared frequently, and then the dimension reduction of the identified inner entity feature matrix was connected outer entity feature matrix to retain the complete inner entity feature information. After that, the BiLSTM+CRF model was used for the outer entity recognition to improve the discrimination of outer entities for the accurate recognition of outer entities. Finally, a comparative experiment was designed to verify the effectiveness of the proposed recognition. The test results show that the accuracy, recall, and F1 value of the named entity recognition task in the aquatic medicine using the BERT+CaBiLSTM+CRF model reached 93.07%, 92.85%, and 92.96%, respectively. The entity structure features were outstanding in terms of specific entity categories, due to the five types of non-nested entities, such as aquatic animal names, drug names, disease names, disease sites, and pathogens. For example, most aquatic animal names contained the radicals, such as "worm" and "fish". The radicalsand drug names were mostly composed of chemical elements, while the disease names were mostly ended with the word "disease", indicating a higher recognition accuracy than that in the nested entities. But in view of the outstanding nested structure of entities, the model performed better to identify the nested named entities, such as the clinical symptoms using the named entity recognition model integrating the BERT and CaBiLSTM designed by the "hierarchical idea". Higher recognition was achieved than before. The recognition accuracy, recall, and F1 value increased by 12.31, 12.76, and 12.53 percentage points, respectively. Therefore, the model can be expected to effectively improve the accuracy of entity recognition caused by ambiguity and entity nesting in the task of diagnosis and treatment of aquatic animal diseases named entity recognition. The finding can provide the potential support to construct the fisheries field knowledge graph, further promote the healthy aquaculture projects.

       

    /

    返回文章
    返回