特色农产品销售评价大数据的弱监督分析方法

    Weakly supervised analysis method for featured agricultural product sales evaluation big data

    • 摘要: 针对特色农产品评价大数据多维度分析中,可信标签不足以及挖掘消费者各维度真实情感语义困难等问题。该研究提出了一种基于弱监督训练的深度学习方法。首先,通过主题模型分析大规模评论,提取产品评价主题和关键词。然后,结合句法依存和情感词典为评论生成不同维度的伪标签。最后,构建多标签多分类深度网络,在伪标签上进行弱监督学习。结果表明,该方法在红心柚评论数据集上取得89.2%的准确率和80.3%的F1值,比随机森林算法提升了7.1个百分点的准确率和11.5个百分点的F1值。相比Transformer模型,准确率提高5.6个百分点,F1值提高2.0个百分点,参数量减少了92%。该方法能从海量评论中高效提取产品评价维度和消费者关注点,为完善农产品质量和销售服务提供数据支持。

       

      Abstract: Extensive data analysis can greatly contribute to the evaluation of featured agricultural products, in order to improve and optimize the agricultural products and marketing strategies. Since there are fewer open-source strongly labeled datasets in Chinese, it is still challenging to find strongly labeled datasets in the domains. In addition, manual labeling is costly and time-consuming at present. In this study, a weakly supervised deep learning was proposed to evaluate big data on featured agricultural products from different dimensions. Firstly, the primary process was used to crawl consumers' evaluation information of some featured agricultural products from the online sales platform by incremental crawler; Secondly, a theme model was selected to define the implicit themes and theme keywords in the evaluation big data; Thirdly, the pseudo-labels were generated on different evaluation dimensions for the big data, according to a combination of syntactic dependency and lexicon-based sentiment judgment; Finally, a multi-label multi-categorization deep learning model was constructed to propose a weakly supervised framework in the evaluation big data with different evaluation dimensions. The pseudo-labeled dataset was utilized to perform the weakly supervised learning. The trained model was used to directly evaluate agricultural products. Only one model was needed to predict the consumers' emotional attitudes on different evaluation dimensions, due to the multitasking structure of the model. In the experiment, a large amount of store and evaluation information was first collected from websites related to specialty agricultural products. The incremental crawlers were adopted to form a multi-source heterogeneous extensive dataset and then stored in the database. Different websites were employed to make the dataset more representative and better eliminate the bias of different user groups, compared with a single source. Heterogeneity indicated that the data from different platforms in the dataset shared different focuses and data composition structures. The heterogeneous data was transformed from multiple sources to obtain an extensive dataset of characteristic agricultural products. Subsequently, "Hongxin pomelo" and "Purple garlic" were used as keywords to retrieve the comments from the database. The experimental dataset was obtained to verify the final prediction and comparative analysis of the model. The results showed that the improved model was achieved in 89.2% accuracy and 80.3% F1-score on the Hongxin pomelo dataset, respectively, whereas, there was an increase in the 7.1 percentage points accuracy and 11.5 percentage points F1-score over Random Forest. Compared with the Transformer model, the accuracy increased by 5.6 percentage points and F1-score by 2 percentage points, respectively, while parameters were reduced by 92%. The product evaluation dimensions and consumer concerns were efficiently extracted from massive reviews. The findings can provide the data support to improve agricultural product quality and sales service.

       

    /

    返回文章
    返回