Abstract:
Abstract: Predicting the incidence rate accurately is an important basis for responding to Panax notoginseng disease in advance and improving yield and quality. The study used field meteorological data and incidence data in the Panax notoginseng planting base in Honghe prefecture, Yunnan province from 2018 to 2019, and used the Principal Components Analysis (PCA) to avoid the occurrence of multiple collinearities. The weather data set from May to September each year was used as the training set validation set, and the Random Forest (RF) algorithm was used as the basic learning machine to construct the preliminary prediction model, and finally, the Gradient Descent (GD) algorithm was used for optimization. The results showed that 1) The incidence of Panax notoginseng disease in the high-incidence period was mainly related to soil temperature, humidity in the shed, and soil heat flux in the shed and above the canopy. The PCA avoided the problem of the multicollinearity and obtained the Pearson correlation coefficient between the indicators, among them, the soil temperature and humidity in the shed were positively related to the incidence rate, and their Pearson correlation coefficient were both between 0.25 and 0.75; the soil heat flux in the shed and the soil heat flux above Panax notoginseng canopy were negatively correlated with the incidence rate, and their Pearson correlation coefficient were both between -0.75 and -0.25. 2) Random forest predicted that the frequency of 35% of the incidence rate in the high-incidence period was relatively low, while the frequency of the incidence rate was between 60% and 80%. The phenomenon of infecting other plants at an exponential growth rate was consistent, and all fall within the confidence interval. The root mean square error value of the evaluation index used by random forest was 0.230, and the prediction effect could be trusted. 3) Through GD optimization, the cost function convergence time value was 241.03, the difference between the predicted incidence rate of Panax notoginseng and the actual incidence rate was 1.5%, and the weight of the impact of each meteorological factor on the incidence rate of Panax notoginseng disease in the high-incidence period was obtained. Where the maximum degree of the positive correlation between soil temperature, weight was 21.686, soil heat flux thirty-seven canopy above the negative correlation between the degree of the largest weight was -13.834. 4) Regarding the impact of various meteorological factors on the incidence rate of the Panax notoginseng disease in the high incidence period, the final prediction model was compared with the PCA obtained from the main effect analysis, and the analysis results of the two were consistent. The research results have reliable predictive capabilities in disease prediction, could provide theoretical basis and technical support for facility environmental regulation and intelligent management to reduce Panax notoginseng disease.