结合知识图谱与双向长短时记忆网络的小麦条锈病预测

    Prediction of wheat stripe rust disease by combining knowledge graph and bidirectional long short term memory network

    • 摘要: 针对现有小麦条锈病预测方法没有利用病害发生因素之间的语义信息,存在预测难度大、准确率低等问题,利用知识图谱(Knowledge Graph, KG)和双向长短时记忆网络(Bi-directional Long Short-Term Memory, Bi-LSTM)处理多源异构复杂数据的各自优势,提出一种基于KG与Bi-LSTM结合的小麦条锈病预测方法。首先,构建小麦条锈病知识图谱,将与小麦条锈病发生相关的环境信息转换为特征向量;其次,利用特征向量训练Bi-LSTM模型,得到基于Bi-LSTM的小麦条锈病预测模型;最后,利用小麦条锈病数据库数据进行试验。结果表明,KG丰富了进行病害预测所描述的语义信息,提升了Bi-LSTM提取高层病害预测特征的能力,从而提高了病害预测的准确率。在小麦条锈病数据库上的预测准确率达到93.21%,比基于Bi-LSTM的病害预测方法提高了4.5个百分点。该方法能较好预测小麦条锈病,为小麦条锈病的预报预警和综合防治提供科学依据。

       

      Abstract: Stripe rust disease of wheat has seriously threaten to the yield and quality of wheat. The disease prediction is a key step of disease control. However, the complex and changing environmental information that occurred concurrently wheat disease make disease prediction difficult. The prediction data of actual crop diseases usually come from environmental factors of field crop growth, or the database of agricultural scientific research institutions and enterprises, where the text data captured from the corpus of agricultural websites. These data cannot be effectively managed and utilized by traditional database and crop prediction methods, owing to the structured, unstructured, and semi-structured data with large amount, multi-source, heterogeneous, noise and redundancy. Knowledge Graph (KG) can be represented with big data and metadata of crop diseases, due to its powerful data fusion, organization, and management functions. Deep Learning (DL) model-based prediction method of crop disease can directly utilize structured data from traditional database systems, but cannot directly utilize large amounts of unstructured and semi-structured data. Most DL models fail to consider the context of text content and relevance between words, leading to a low ability of feature expression in extraction or learning. The memory of Long Short Term Memory (LSTM) model normally originates from the front to the back, leading to effectively capture in remembering contextual information, but not for contextual information from the back. Therefore, Bi-directional LSTM (Bi-LSTM) model has been added a reverse LSTM layer on top to transfer information from back to forward, indicating that Bi-LSTM can be used to effectively learn the characteristic representation of environmental factor data of crop diseases. However, the sensitivity of Bi-LSTM model to environmental factors is low, and thereby the predicted results can be difficult to understand. In this study, A prediction method of wheat rust disease was proposed using the combination of Bi-LSTM model and KG, in order to enhance the semantic information of original environmental factors and interpretability of Bi-LSTM. In the knowledge driven Bi-directional LSTM model, a physical link disambiguation and KG embedded extraction can be used to capture disease KG of structured disease knowledge, disease in the text, and described the condition of key vector with the corresponding knowledge entities vector as Bi-directional LSTM multichannel input, in the process of convolution and from the aspects of semantics and knowledge represent different types of diseases. First, KG of wheat disease was constructed, while the environmental information related to the occurrence of wheat disease was transformed into the feature vector. Then, a prediction model of wheat stripe rust was established based on Bi-LSTM. The proposed method was validated on the disease dataset of wheat stripe rust. The prediction accuracy of wheat stripe rust was 93.21%, indicating 4.5 percentage points higher than that of only Bi-LSTM based method of disease prediction. The proposed algorithm can well predict wheat disease, further to provide scientific basis for forecasting, warning, and comprehensive control of wheat rust disease.

       

    /

    返回文章
    返回