Calculation of the causal strength of stored grain pest events augmented using counterfactual data method
-
Graphical Abstract
-
Abstract
Stored grain pests have been one of the most important influencing factors on food security in recent years. It is extremely critical to explore the grain storage pest events and their causal relationships. Furthermore, the causal strength among grain storage pest events can be expected to more accurately assess the potential risks, in order to formulate the preventive and control measures. However, the data bias in the grain storage pest domain can often rely overly on the surface features in the dataset, leading to low efficiency with generalized data. In this study, the causal strength among events was accurately computed and quantified using counterfactually augmented data. As such, the counterfactual data augmentation-event causal strength computation framework (CDA-ECS) was designed to generate the counterfactual instances using a large language model (LLM). The original data was then extended to integrate the debiased causal knowledge into the pre-trained language model. The causal relationships of sentences were learned more deeply to improve the generalization of the model. Specifically, three stages were divided: In the first stage, the premise sentences in the event pairs were inputted into a retriever to obtain the top k sentences that were similar in style and opposite in semantics to the original sentences; In the second stage, a rule-based cueing template was designed using the retrieved sentences. The large language model was utilized to generate the compliant sentences, and then adjust the labels of the original event pair sentences using the samples; In the third stage, the original training and the newly generated instances were merged into a new corpus to train together the pre-trained language model. The causal features of the events were learned to improve the accuracy of the reasoning on the generalized data, in order to obtain the causal strength score. Experiments on the public and domain datasets demonstrated that the more robust models were trained with 2.4 percentage points higher accuracy on the inference task on generalized data, which was effectively applied to calculate the causal intensity of grain storage pest events. The counterfactual data augmentation was introduced to represent the data bias in the field of grain storage pests. The diversity and complexity of the augmented data were utilized to more deeply understand the complex links among pest behavior and environmental factors, in order to achieve the risk analysis of grain storage pest events. Nevertheless, it was still lacking human intervention in the process of counterfactual data using LLM, particularly when the labels were flipped. The quantification of causal relationships can also be expected to improve in the future. The counterfactual data generation can be optimized to further improve the quality of counterfactual data generation. The finding can provide a reliable basis to quantify the causal intensity of events. In conclusion, an effective solution can be proposed to improve the performance of causal analysis models in the field of grain storage pests. It is also expected to serve as the more accurate decision-making in risk assessment and management.
-
-