基于数据质量控制的葡萄生产信息采集系统设计与应用

    Design and application of grape production information collecting system based on data quality controlling

    • 摘要: 为了克服传统调研的时间周期长、数据质量低等缺点,实现葡萄生产信息远程获取,在采用模型(Model)-视图(View)-控制器(Controller)模式的结构框架和PHP(hypertext preprocessor)语言的基础上开发了浏览器/服务器(Browser/Server)模式的基于数据质量控制的葡萄生产信息采集系统。同时,设计了一套针对数据获取过程的约束规则、异常数据以及空缺数据的质量控制流程。应用结果表明,系统运行稳定,且大大提高了收集到数据的准确性和效率,为葡萄产业信息分析提供了快捷的数据收集工具和可靠的数据质量控制技术。

       

      Abstract: Abstract: The survey of viticulture through paper-and-pencil questionnaires and field interviews has some drawbacks: long period, high cost and low data quality. Establishing a standard set of specifications for data collection process in vineyard management and analysis helps to achieve online research of the grape industry and provide decision support in time. Aiming at the information intelligent acquisition for grape production, MVC (model, view, and controller) framework, PHP (Hypertext Preprocessor) language and B/S (browser/server) structure are adopted to development the production information collecting system. During the process of system designing, according to the characteristics of grape production business and production input and output data, the vineyard information is classified into by basic information and production information and managed separately, which can ensure the integrity and accurateness of data in every process such as acquisition, transformation, storage and computation. The production information collecting system, which includes 3 functional modules: online data collection, data quality control and system management for grape production, is designed and developed. The main function of online survey module is to implement the input and reporting of vineyard basic information and production information. In order to ensure the quality of data, each item of data will be determined by the system with constrain rules after user inputting the forms. The development of user input constraint rules aims to reduce errors in input; outlier detection based on robust regression estimation method is designed to reduce the adverse effects of abnormal data on analysis results; different strategies are made to fill the missed data, so that more complete data can be set. The above method constitute a set of data pre-processing and empirical calculations prove that the pre-treatment process can ensure the accuracy of the system analysis results in a greater degree. Then a set of data quality control method is put forward based on constraint rules management, MM (multiple maximum likelihood type estimates) robust estimation and EM (expectation maximization) algorithm. Meanwhile, this study establishes a pretreatment process of costs and benefits of grape production statistically. In order to test the performance of the system, data of 779 questionnaires from 22 provinces, municipalities and autonomous districts in 2012 were chosen to detect outlier by MM robust estimation, and the result showed the model had good fitness. In terms of data quality control, the accuracy and the normality of processed data have a great improvement, which makes the cycle of data processing reduce a lot. By using the proposed method, questionnaires with outliers can be found. Taking Shanghai as an example, the vacancy value, i.e. labor cost, can be filled. The filled data fit well to the model. Application results show that the system can meet the needs of different users, and it provides a high efficient data collection tool and reliable data quality control technology for the grape industry information analysis, so it can improve the efficiency and effectiveness for survey. In order to facilitate the farmers to upload and manage data anytime and anywhere, optimizing data quality control algorithms and mobile phone applications are an important directions for future research.

       

    /

    返回文章
    返回