Design and application of grape production information collecting system based on data quality controlling
-
-
Abstract
Abstract: The survey of viticulture through paper-and-pencil questionnaires and field interviews has some drawbacks: long period, high cost and low data quality. Establishing a standard set of specifications for data collection process in vineyard management and analysis helps to achieve online research of the grape industry and provide decision support in time. Aiming at the information intelligent acquisition for grape production, MVC (model, view, and controller) framework, PHP (Hypertext Preprocessor) language and B/S (browser/server) structure are adopted to development the production information collecting system. During the process of system designing, according to the characteristics of grape production business and production input and output data, the vineyard information is classified into by basic information and production information and managed separately, which can ensure the integrity and accurateness of data in every process such as acquisition, transformation, storage and computation. The production information collecting system, which includes 3 functional modules: online data collection, data quality control and system management for grape production, is designed and developed. The main function of online survey module is to implement the input and reporting of vineyard basic information and production information. In order to ensure the quality of data, each item of data will be determined by the system with constrain rules after user inputting the forms. The development of user input constraint rules aims to reduce errors in input; outlier detection based on robust regression estimation method is designed to reduce the adverse effects of abnormal data on analysis results; different strategies are made to fill the missed data, so that more complete data can be set. The above method constitute a set of data pre-processing and empirical calculations prove that the pre-treatment process can ensure the accuracy of the system analysis results in a greater degree. Then a set of data quality control method is put forward based on constraint rules management, MM (multiple maximum likelihood type estimates) robust estimation and EM (expectation maximization) algorithm. Meanwhile, this study establishes a pretreatment process of costs and benefits of grape production statistically. In order to test the performance of the system, data of 779 questionnaires from 22 provinces, municipalities and autonomous districts in 2012 were chosen to detect outlier by MM robust estimation, and the result showed the model had good fitness. In terms of data quality control, the accuracy and the normality of processed data have a great improvement, which makes the cycle of data processing reduce a lot. By using the proposed method, questionnaires with outliers can be found. Taking Shanghai as an example, the vacancy value, i.e. labor cost, can be filled. The filled data fit well to the model. Application results show that the system can meet the needs of different users, and it provides a high efficient data collection tool and reliable data quality control technology for the grape industry information analysis, so it can improve the efficiency and effectiveness for survey. In order to facilitate the farmers to upload and manage data anytime and anywhere, optimizing data quality control algorithms and mobile phone applications are an important directions for future research.
-
-