Unsupervised Intrusion Detection
Using Clustering
Approach
Muhammet Kabukçu
Sefa Kılıç
Ferhat Kutlu
Teoman Toraman
1
/29
Outline
Introduct
i
on
U
s
i
ng
C
luster
i
ng
for
I
ntrus
i
on
D
etect
i
on
Methodology
Overall Summary
Conclusion
References
2
/29
Introduct
i
on
•
Incidents are violations or
imminent threats of violation of:
*
computer security
policies,
*
acceptable use policies,
*
standard security practices.
•
Intrusion detection is the process of monitoring the events
occurring in a computer system or network and analyzing
them for signs of possible incidents.
3
/29
Introduct
i
on
•
An intrusion detection
system (IDS) is software
that automates the
intrusion detection
process.
•
IDSs are primarily
focuses on identifying possible
incidents and detecting when
an attacker has
successfully compromised a system by exploiting
vulnerability in the
syste
m.
4
/29
Methodologies
of
IDS
T
echnologies
Signature
-
Based
Detection
Anomaly
-
Based
Detection
Stateful
Protocol
Analysis
Introduct
i
on
5
/29
Signature
-
Based Detection
A signature is a pattern that corresponds to a known
threat (e.g. a telnet attempt with a username of "root",
which is a violation of an organization's security policy).
Signature
-
based detection is the process of comparing
signatures against observed events to identify possible
incidents.
Advantage
: Very effective at detecting
known threats
.
Disadvantage
:
Ineffective at detecting
previously
unknown threats
.
6
/29
Anomaly
-
Based Detection
The process of comparing definitions of what activity is
considered normal against observed events to identify
significant deviations.
Capable of detecting
previously unknown threats
.
Uses host or network
-
specific profiles.
7
/29
Detection
by
Stateful
Protocol Analysis
T
he process of comparing predetermined pro
fi
les of
generally accepted de
fi
nitions of benign protocol activity
for each protocol
state against observed events to
identify deviations.
R
elies on vendor
-
developed universal pro
fi
les that
specify how particular protocols should and should not
be
used.
8
/29
U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on
D
etect
i
on
Methods
other
than
Signature
-
Based
Detection
use
data
mining
and
machine
learning
algorithms
to
train
on
labeled
network
data
.
For training data, there are two major paradigms
:
M
isuse
D
etection
A
nomaly
D
etection
.
9
/29
Which one to use ???
U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on
D
etect
i
on
-
Misuse Detection
-
In
misuse detection
, machine learning algorithms
are
used with labeled data.
By
using the extracted features from labeled
network
tra
ffi
c,
network data is classi
fi
ed
.
By using new data which includes new type of
attacks,
detection models
are
retrained
.
10
/29
In
anomaly detection
,
models are built by training
on
normal data
,
deviations are searched over the normal
model
.
Generating purely normal
data is
very
di
ffi
cult and costly in practice.
It is very
hard to guarantee that
there are no attacks during the time
the
tra
ffi
c is collected from the
network.
11
/29
U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on
D
etect
i
on
-
Anomaly Detection
-
12
/29
U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on
D
etect
i
on
Use
a
mechanism
to
detect
intrusions
by
using
unlabeled
data
as
a
train
model
.
Fi
nd
intrusions
buried
within
that
data
.
M
isuse
D
etection
A
nomaly
D
etection
.
A
S
et of
U
nlabeled
D
ata
Unsupervised
Anomaly
Detection
Algorithm
Connection
Comparison
with Detected
Clusters
Detected Intrusion
Clusters
A
ssumptions
for unsupervised anomaly
detection algorithm
:
1.
The intrusions are rare with respect to normal network
traffic
.
2.
The intrusions are different from normal network traffic
.
As a
R
esult
:
The
i
ntrusions
will appear as outliers
in the data.
U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on
D
etect
i
on
Detected
malicious
attacks
13
/29
The unsupervised anomaly
detection algorithm clusters
the unlabeled data instances
together into clusters using a
simple distance
-
based metric.
14
/29
U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on
D
etect
i
on
U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on
D
etect
i
on
Once data is clustered, all of the
instances that appear in
small clusters are labeled as
anomalies
because;
The normal instances should
form large clusters compared to
the intrusions,
Malicious intrusions and normal
instances are qualitatively
different, so they do not fall into
the same cluster.
15
/29
Normal cluster
Intrusion cluster
Methodology
1.
Description of the dataset
2.
Metric & Normalization
3.
Clustering Algorithm
a)
Portnoy et
.
a
l
.
b)
Y
-
means Algorithm
4.
Labeling Clusters
5.
Intrusion Detection
16
/29
Description of the dataset
•
KDD Cup 1999 Data
•
Main
attack
categories
–
DOS
:
Denial of Service, (e.g.
synood
)
–
R2L
:
Unauthorized access from a remote machine
(e.g. guessing password)
–
U2R
:
Unauthorized access to local
superuser
(root) privileges (e.g. various
bu
ff
er
over
f
low
attacks)
–
Probing
:
Surveillance and other probing (e.g. port
scanning)
•
In total, 24 attack types in training data; 14
additional ones in test data...
17
/29
Metric & Normalization
•
Euclidean Metric
(for distance computation)
•
Feature Normalization
(to eliminate the difference in the scale of features)
18
/29
Clustering Algorithm (Portnoy et. al.)
.
.
.
X
i
Training set
Empty set of clusters
d1
d2
d3
-
d1
is selected.
-
if
d1 < W
( predefined threshold value ),
then X
i
is assigned to that cluster.
-
else, a new cluster is created, then X
i
is assigned to it.
19
/29
•
Advantage
:
No need to know the initial no. of
clusters.
•
Disadvantage
:
Need to know
W
, which may label
instances wrong in some cases.
•
However
…
Clustering Algorithm (
Portnoy
et. al.)
20
/29
Clustering Algorithm (Y
-
means Algorithm)
•
3 main parts:
1.
assigning instances to k clusters
2.
splitting clusters
3.
merging clusters
21
/29
1.
assigning instances to
k
clusters
Dataset
k: no. of clusters
n: no. of instances
1 < k < n
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
redefine
cluster
centroid
Clustering Algorithm (Y
-
means Algorithm)
22
/29
2.
splitting clusters
.
Confident area
t
X
i
( instance )
.
d
i
t ( normal threshold) = 2.32 σ
σ = standard deviation
•
if d
i
> t , X
i
is an outlier.
•
New clusters are created firstly
with the farthest outliers.
Clustering Algorithm (Y
-
means Algorithm)
23
/29
3.
merging clusters
.
X
i
If X
i
is in the confident area of two clusters, merge these
clusters back.
Clustering Algorithm (Y
-
means Algorithm)
24
/29
Labeling Clusters
•
Our
first assumption:
# of
normal instances >
>
# of
intrusions
•
Label instances in large clusters: normal
•
Label instances in small clusters: intrusion
•
Start labeling as normal, until 99% of data is labeled
as normal, label rest of them as intrusion.
Normal cluster
Intrusion cluster
25
/29
Intrusion Detection
For test instance
x
,
Measure the distance to each cluster.
Select the nearest cluster
C
.
If
C
is normal cluster, label
x
as normal,
Otherwise label
x
as intrusion.
26
/29
Overall Summary
•
IDS & IDS Technologies
•
Using Clustering for Intrusion Detection
•
Methodology
1.
Description of the dataset
2.
Metric & Normalization
3.
Clustering Algorithm
4.
Labeling Clusters
5.
Intrusion Detection
Conclusion
•
Unsupervised Clustering is choosen.
•
KDD Cup 1999 Data
•
Y
-
means Algorithm is used for creating ID System.
27
/29
References
[1] K
DD
C
up 1999 data.
http://kdd.ics.uci.edu/
databases/kddcup99/kddcup99.html.
[2] Y. Guan and A. A. Ghorbani. Y
-
means: A clustering
method for
intrusion detection. In Proceedings of
Canadian Conference
on Electrical and Computer
Engineering, pages 1083{1086,
2003.
[3] L. Portnoy, E. Eskin, and S. Stolfo. Intrusion detection
with
unlabeled data using clustering. In Proceedings of
ACM CSS
Workshop on Data Mining Applied to
Security (DMSA
-
2001),
2001.
[4] K. Scarfone and P. Mell. Guide to intrusion detection
and
prevention systems (idps), 2007.
28
/29
Questions?
29
/29
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο