Robust Clustering Analysis for Self-Monitoring Distributed Systems

tribecagamosisΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 4 χρόνια και 5 μέρες)

87 εμφανίσεις

Title
:
Robust Clustering Analysis for Self
-
Monitoring Distributed Systems


Principal
researchers
:

Manish Parashar and Andres Quiroz

(
Rutgers University
)

Current collaborators
:

Naveen Sharma and Nathan Gnanasambandam

(XEROX)


Status
:



Ongoing


Summary
:

The control and timely

management of large
-
scale dis
tributed systems, such as device networks,
data centers,

and compute clusters ar
e tasks that are rapidly exceed
ing human ability, given their
complexity, dynamics,

and large amounts of data involved.
Thu
s, the auto
mated and online
ma
nagement of these systems is es
sential to ensure their continued performance and robust
operation. Fortunately, systems' available in
-
network

resources can be harnessed to perform self
-
monitoring

and data analysi
s tasks which
are crucial for eff
ective

management.

A self
-
monitoring s
ystem is able to observe and an
alyze system state and behavior, to
discover anomalies

or violations, and
to notify autonomic or human ad
ministrators in a timely
manner so that appropriate

management
actions can be eff
ectively applied.

Furthermore,
implementing the analysis technique in a decentralized and in
-
network fashion (using network
resources and minimal extraneous information) ensures computational tractability and
acceptable response times.

H
owever,
because

self
-
monitoring mechanisms are subject to the
same

failures that occur
in the network that

they are helping to

manage, the robustness of these
mechanisms is of great

importance to ensure ov
erall system reliability. There
fore,
it is very
imp
ortant to ensure

the

robustness

of the proposed solution at diff
erent levels.

Working toward achieving the goals outlined above,

the main contribution of this
work

is
the formulation

and validation of a robust decentralized data analysis

mechanism

[1]
that

applie
s
density
-
based clustering techniques [2
-
4
]

to identify anomalies and clusters of arbitrary size and
shape in monitoring data.
Clustering data is given in the form of periodic
behavior and
operational status
updates events from system components, de
fined in terms of

known attributes.
The event attributes are used to construct a multidimensional coordinate space, which is then
used to measure the similarity of events.
Components

that behave in a similar fashion can then be
identified by the clus
ters f
ormed by their status events in this space, while

devices with abnormal
behavior will produce isolated

events. The clustering

algorithm requires minimal com
putation at
processing nodes, which makes it suitable

for online execution.

The robustness of the de
centralized mechanisms is

dealt with at three levels. First, we
assume that the

connectivity of the network is maintained despite node

failures through self
-
healing mechanisms provided at

the overlay level. Next, at the

data messaging

level, we use
replica
tion to prevent the loss of the

events required for the clustering analysis. To minimize

the
overhead incurred by replication, data is selectively

replicated at nodes bas
ed on their probability
of fail
ure, which is obtained by maintaining a failure history

and calculated using an appropriate
failure model. The

selectivity of replicatio
n can be further aided by infor
mation available at the
analysis

level. Because the

primary focus of the clustering analysis is on anomaly

detection, only
points t
hat are most
likely to be anoma
lies should be replicated. This can be predicted given

previous clusters and anomalies observed in the system.

This work is part of an ongoing effort to create tools for integrated
data analysis for the
autonomic management
of
performance
, security/trust and reliability at a system level.

Current
and future efforts include
improving cluster descriptions produced by the algorithm for effective
profiling of system behavior and

developing predictive system models

of distributed system state.
We plan to combine these mechanisms with tools for defining and conditioning the application of
system policies with these profiles and state predictions for autonomic resource management and
provisioning, usage control and monitoring, and trust management

and authentication.

Reference
s
:

1.

“Robust Clustering Analysis for the Management of Self
-
Monitoring Distributed Systems,” A. Quiroz, N. Gnanasambandam, M.
Parashar, and N. Sharma. To appear in Journal of Cluster Computing, Springer 2008, DOI: 10.1007/s10586
-
008
-
0068
-
5.

2.

“Algorithms for Clustering Data,” A.K. Jain and R.C. Dubes. Prentice Hall, 1988.

3.

“A density
-
based algorithm for discovering clusters in large spatial databases with noise,” M. Ester, H.P. Kriegel, J. Sander, X.
Xu. In: Proceedings of 2nd Inte
rnational Conference on Knowledge Discovery and Data Mining (KDD
-
96), 1996.

4.

“Automatic subspace clustering of high dimensional data for data mining applications,” R. Agrawal, J. Gehrke, D. Gunopulos, P
.
Raghavan. In: Proceedings of 1998 ACM
-
SIGMOD Int. Con
f. Management of Data, pp. 94{105. Seattle, Washington,1998.