Abstract:
Stripe rust disease of wheat has seriously threaten to the yield and quality of wheat. The disease prediction is a key step of disease control. However, the complex and changing environmental information that occurred concurrently wheat disease make disease prediction difficult. The prediction data of actual crop diseases usually come from environmental factors of field crop growth, or the database of agricultural scientific research institutions and enterprises, where the text data captured from the corpus of agricultural websites. These data cannot be effectively managed and utilized by traditional database and crop prediction methods, owing to the structured, unstructured, and semi-structured data with large amount, multi-source, heterogeneous, noise and redundancy. Knowledge Graph (KG) can be represented with big data and metadata of crop diseases, due to its powerful data fusion, organization, and management functions. Deep Learning (DL) model-based prediction method of crop disease can directly utilize structured data from traditional database systems, but cannot directly utilize large amounts of unstructured and semi-structured data. Most DL models fail to consider the context of text content and relevance between words, leading to a low ability of feature expression in extraction or learning. The memory of Long Short Term Memory (LSTM) model normally originates from the front to the back, leading to effectively capture in remembering contextual information, but not for contextual information from the back. Therefore, Bi-directional LSTM (Bi-LSTM) model has been added a reverse LSTM layer on top to transfer information from back to forward, indicating that Bi-LSTM can be used to effectively learn the characteristic representation of environmental factor data of crop diseases. However, the sensitivity of Bi-LSTM model to environmental factors is low, and thereby the predicted results can be difficult to understand. In this study, A prediction method of wheat rust disease was proposed using the combination of Bi-LSTM model and KG, in order to enhance the semantic information of original environmental factors and interpretability of Bi-LSTM. In the knowledge driven Bi-directional LSTM model, a physical link disambiguation and KG embedded extraction can be used to capture disease KG of structured disease knowledge, disease in the text, and described the condition of key vector with the corresponding knowledge entities vector as Bi-directional LSTM multichannel input, in the process of convolution and from the aspects of semantics and knowledge represent different types of diseases. First, KG of wheat disease was constructed, while the environmental information related to the occurrence of wheat disease was transformed into the feature vector. Then, a prediction model of wheat stripe rust was established based on Bi-LSTM. The proposed method was validated on the disease dataset of wheat stripe rust. The prediction accuracy of wheat stripe rust was 93.21%, indicating 4.5 percentage points higher than that of only Bi-LSTM based method of disease prediction. The proposed algorithm can well predict wheat disease, further to provide scientific basis for forecasting, warning, and comprehensive control of wheat rust disease.