基于聚类分析的农业SCADA服务器预警阈值提取方法

杨丽丽; 吴春辉; 张大卫; 苏娟

doi:10.11975/j.issn.1002-6819.2017.z1.044

基于聚类分析的农业SCADA服务器预警阈值提取方法

Early-warning threshold extraction method of agricultural SCADA server based on clustering analysis

摘要

摘要: 针对计算机服务器预警阈值人为设定不准确的问题，该文以某农业数据采集与监视控制系统（supervisory control and data acquisition，SCADA）中Apache服务器为研究对象，提出一种基于聚类分析提取服务器监控指标预警阈值的方法。首先对服务器运行数据与某类异常发生前的预警信息进行特征选择。在数据分布形状未知的情况下，对特征选择结果分别用K-means和CURE（clustering using representative）2种聚类算法挖掘异常发生前服务器运行状态的普遍特征，将聚类结果用于提取该类异常的预警阈值。试验表明：特征选择可提取出影响该SCADA系统中服务器性能的监控指标。对比聚类结果，CURE算法聚类质心与正常信息质心的距离范围为0.02~0.05，而K-means算法聚类质心与正常信息质心的距离范围为0.15~0.2，CURE算法提取的预警阈值更加靠近预警发生时的服务器临界状态。在实际验证中，CURE相较于K-means预警时间至少提前24 h，该文方法提取的服务器预警阈值相比人工方式能更早地发现系统潜在风险，可用于动态更新预警阈值。

Abstract: Abstract: In the agricultural SCADA (supervisory control and data acquisition) system, the early-warning threshold of server is usually set in a manual way. This way is neither precise nor punctual. To solve this problem, our paper proposes a method based on data-mining which takes Apache server in agricultural SCADA system as research object. In the agricultural SCADA system, front end user access and state real-time display of the devices monitoring indicators, such as CPU (Central Processing Unit) temperature and CPU fan speed, are based on the Apache server. Apache server handles a large number of concurrent access requests from front end users. When the number of devices and front end users is huge, this may cause a server access pressure. What's worse, the server will crash. A way to solve this matter is to give out a warning signal before the situation goes worse. When and how to give a signal need to be solved, so our paper proposes a method. First step, Apache operating data and the early-warning message of some exceptions for a period of time are collected. When the data are enough to be analyzed, we select feature from the data collected. This step is mainly to eliminate interference factors. And the feature subset obtained by feature selection means that the dimensions in the feature subset have more influence on the Apache running performance. To verify the influence of different feature subset on the clustering analysis, we set 2 different feature weight thresholds to gain different feature subset in this paper, and the feature weight threshold will filter the features whose weight is less than the threshold. Due to that the distribution shape of the data is unknown and the clustering algorithm has requirements in data distribution shape, in the next step, we cluster the result of feature selection with K-means algorithm and CURE (Clustering Using Representatives) algorithm respectively, which can gain the common characteristic of the server when the exception is about to happen. At last, we extract the early-warning threshold with the better result of clustering. In our method, the clustering algorithm with better result means it is more suitable to deal with the data, but when the data have another distribution shape, it may works badly. And this is why we choose 2 algorithms to do the clustering analysis with avoiding data shape preferences. We apply our method to experiment, and the result shows that feature selection can find out the performance bottleneck of the Apache server in SCADA system, and CURE algorithm gets a better clustering result. The verification test with the operating data of the server proves that our method can find out the potential risk of the system more early than the manual way. The research of this paper provides a new thinking in the extraction of early-warning threshold in computer monitoring for agricultural SCADA system.

HTML全文

参考文献(30)

施引文献

资源附件(0)