Abstract:
Abstract: In the agricultural SCADA (supervisory control and data acquisition) system, the early-warning threshold of server is usually set in a manual way. This way is neither precise nor punctual. To solve this problem, our paper proposes a method based on data-mining which takes Apache server in agricultural SCADA system as research object. In the agricultural SCADA system, front end user access and state real-time display of the devices monitoring indicators, such as CPU (Central Processing Unit) temperature and CPU fan speed, are based on the Apache server. Apache server handles a large number of concurrent access requests from front end users. When the number of devices and front end users is huge, this may cause a server access pressure. What's worse, the server will crash. A way to solve this matter is to give out a warning signal before the situation goes worse. When and how to give a signal need to be solved, so our paper proposes a method. First step, Apache operating data and the early-warning message of some exceptions for a period of time are collected. When the data are enough to be analyzed, we select feature from the data collected. This step is mainly to eliminate interference factors. And the feature subset obtained by feature selection means that the dimensions in the feature subset have more influence on the Apache running performance. To verify the influence of different feature subset on the clustering analysis, we set 2 different feature weight thresholds to gain different feature subset in this paper, and the feature weight threshold will filter the features whose weight is less than the threshold. Due to that the distribution shape of the data is unknown and the clustering algorithm has requirements in data distribution shape, in the next step, we cluster the result of feature selection with K-means algorithm and CURE (Clustering Using Representatives) algorithm respectively, which can gain the common characteristic of the server when the exception is about to happen. At last, we extract the early-warning threshold with the better result of clustering. In our method, the clustering algorithm with better result means it is more suitable to deal with the data, but when the data have another distribution shape, it may works badly. And this is why we choose 2 algorithms to do the clustering analysis with avoiding data shape preferences. We apply our method to experiment, and the result shows that feature selection can find out the performance bottleneck of the Apache server in SCADA system, and CURE algorithm gets a better clustering result. The verification test with the operating data of the server proves that our method can find out the potential risk of the system more early than the manual way. The research of this paper provides a new thinking in the extraction of early-warning threshold in computer monitoring for agricultural SCADA system.