Zheng Limin, Ren Lele. Named entity recognition in human nutrition and health domain using rule and BERT-FLAT[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(20): 211-218. DOI: 10.11975/j.issn.1002-6819.2021.20.024
    Citation: Zheng Limin, Ren Lele. Named entity recognition in human nutrition and health domain using rule and BERT-FLAT[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(20): 211-218. DOI: 10.11975/j.issn.1002-6819.2021.20.024

    Named entity recognition in human nutrition and health domain using rule and BERT-FLAT

    • Abstract: A nutritious and healthy diet can be widely expected to reduce the incidence of disease, while improving body health after the disease occurs. The nutritional diet knowledge can be acquired mostly through the Internet in recent years. However, reliable and integrated information is highly difficult to discern using time-consuming searching of the huge amount of Internet data. It is an urgent need to integrate the complicated data, and then construct the knowledge graph of nutrition and health, particularly with timely and accurate feedback. Among them, a key step is to accurately identify entities in nutritional health texts, providing effective location data support to the construction of knowledge graphs. In this study, a BRET+BiLSTM+CRF (Bidirectional Encoder Representations from Transformers + Bi-directional Long Short-Term Memory + Conditional Random Field) model was first used with location information. It was found that the precision of the model was 86.56%, the recall rate was 91.01%, and the F1 score was 88.72%, compared with the model without location information, indicating improved by 1.55, 0.20, and 0.32 percentage points. A named entity recognition was also proposed to accurately obtain six types of entities in text: food, nutrients, population, location, disease, and efficacy in the field of human nutritional health, combining rules with BERT-FLAT (Bidirectional Encoder Representations from Transformers-Flat Lattice Transformer) model. Firstly, the character and vocabulary information were stitched together and pre-trained in the BERT model to improve the recognition ability of the model to entity categories. Then, a position code was created for the head and tail position of each character and vocabulary, where the entity position was located with the help of a position vector, in order to improve the recognition of entity boundary. A long-distance dependency was also captured using the Transformer model. Specifically, the output of the BERT model was embedded into the Transformer as a character-embedding conjunction word, thus for the character-vocabulary fusion. Then the text prediction sequence was obtained from the CRF layer. Finally, seven rules were formulated, according to the text characteristics in the field of nutrition and health, where the prediction sequence was modified according to the rules. The experimental results showed that the F1 score of the BERT-FLAT model was 88.99%. The BERT model combined with the word fusion performed the best, compared with that without the Bert model, indicating an effective recognition performance. Correspondingly, the named entity recognition model in the field of nutrition and health using fusion rules and the BERT-FLAT model presented an accuracy rate of 95.00%, a recall rate of 88.88%, and an F1 score of 91.81%. The F1 score increased by 2.82 percentage points than before. The finding can provide an effective entity recognition in the field of human nutrition and health.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return