YI Wenlong, ZHANG Li, LIU Muhua, et al. Weakly supervised analysis method for featured agricultural product sales evaluation big data[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(12): 183-192. DOI: 10.11975/j.issn.1002-6819.202401003
    Citation: YI Wenlong, ZHANG Li, LIU Muhua, et al. Weakly supervised analysis method for featured agricultural product sales evaluation big data[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(12): 183-192. DOI: 10.11975/j.issn.1002-6819.202401003

    Weakly supervised analysis method for featured agricultural product sales evaluation big data

    • Extensive data analysis can greatly contribute to the evaluation of featured agricultural products, in order to improve and optimize the agricultural products and marketing strategies. Since there are fewer open-source strongly labeled datasets in Chinese, it is still challenging to find strongly labeled datasets in the domains. In addition, manual labeling is costly and time-consuming at present. In this study, a weakly supervised deep learning was proposed to evaluate big data on featured agricultural products from different dimensions. Firstly, the primary process was used to crawl consumers' evaluation information of some featured agricultural products from the online sales platform by incremental crawler; Secondly, a theme model was selected to define the implicit themes and theme keywords in the evaluation big data; Thirdly, the pseudo-labels were generated on different evaluation dimensions for the big data, according to a combination of syntactic dependency and lexicon-based sentiment judgment; Finally, a multi-label multi-categorization deep learning model was constructed to propose a weakly supervised framework in the evaluation big data with different evaluation dimensions. The pseudo-labeled dataset was utilized to perform the weakly supervised learning. The trained model was used to directly evaluate agricultural products. Only one model was needed to predict the consumers' emotional attitudes on different evaluation dimensions, due to the multitasking structure of the model. In the experiment, a large amount of store and evaluation information was first collected from websites related to specialty agricultural products. The incremental crawlers were adopted to form a multi-source heterogeneous extensive dataset and then stored in the database. Different websites were employed to make the dataset more representative and better eliminate the bias of different user groups, compared with a single source. Heterogeneity indicated that the data from different platforms in the dataset shared different focuses and data composition structures. The heterogeneous data was transformed from multiple sources to obtain an extensive dataset of characteristic agricultural products. Subsequently, "Hongxin pomelo" and "Purple garlic" were used as keywords to retrieve the comments from the database. The experimental dataset was obtained to verify the final prediction and comparative analysis of the model. The results showed that the improved model was achieved in 89.2% accuracy and 80.3% F1-score on the Hongxin pomelo dataset, respectively, whereas, there was an increase in the 7.1 percentage points accuracy and 11.5 percentage points F1-score over Random Forest. Compared with the Transformer model, the accuracy increased by 5.6 percentage points and F1-score by 2 percentage points, respectively, while parameters were reduced by 92%. The product evaluation dimensions and consumer concerns were efficiently extracted from massive reviews. The findings can provide the data support to improve agricultural product quality and sales service.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return