Unsupervised Intrusion Detection

savagelizardAI and Robotics

Nov 25, 2013 (3 years and 9 months ago)

154 views

Unsupervised Intrusion Detection
Using Clustering

Approach

Muhammet Kabukçu

Sefa Kılıç

Ferhat Kutlu

Teoman Toraman

1
/29

Outline



Introduct
i
on


U
s
i
ng

C
luster
i
ng

for
I
ntrus
i
on

D
etect
i
on


Methodology


Overall Summary


Conclusion


References


2
/29

Introduct
i
on


Incidents are violations or
imminent threats of violation of:


*

computer security

policies,



*
acceptable use policies,


*
standard security practices.


Intrusion detection is the process of monitoring the events

occurring in a computer system or network and analyzing

them for signs of possible incidents.

3
/29

Introduct
i
on


An intrusion detection
system (IDS) is software
that automates the
intrusion detection
process.




IDSs are primarily

focuses on identifying possible
incidents and detecting when

an attacker has
successfully compromised a system by exploiting
vulnerability in the
syste
m.


4

/29

Methodologies

of
IDS
T
echnologies

Signature
-
Based
Detection

Anomaly
-
Based
Detection

Stateful

Protocol
Analysis

Introduct
i
on

5

/29

Signature
-
Based Detection


A signature is a pattern that corresponds to a known
threat (e.g. a telnet attempt with a username of "root",
which is a violation of an organization's security policy).




Signature
-
based detection is the process of comparing
signatures against observed events to identify possible
incidents.


Advantage
: Very effective at detecting
known threats
.


Disadvantage
:

Ineffective at detecting
previously



unknown threats
.

6

/29

Anomaly
-
Based Detection




The process of comparing definitions of what activity is
considered normal against observed events to identify
significant deviations.


Capable of detecting
previously unknown threats
.


Uses host or network
-
specific profiles.

7

/29

Detection

by
Stateful

Protocol Analysis


T
he process of comparing predetermined pro
fi
les of
generally accepted de
fi
nitions of benign protocol activity
for each protocol

state against observed events to
identify deviations.



R
elies on vendor
-
developed universal pro
fi
les that
specify how particular protocols should and should not
be

used.

8

/29

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on


Methods

other

than

Signature
-
Based

Detection

use

data

mining

and

machine

learning

algorithms

to

train

on

labeled

network

data
.



For training data, there are two major paradigms
:


M
isuse

D
etection




A
nomaly
D
etection
.

9

/29

Which one to use ???

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

-

Misuse Detection
-


In
misuse detection
, machine learning algorithms
are

used with labeled data.


By

using the extracted features from labeled
network

tra
ffi
c,

network data is classi
fi
ed
.


By using new data which includes new type of
attacks,

detection models
are
retrained
.

10

/29


In
anomaly detection
,

models are built by training

on
normal data
,

deviations are searched over the normal

model
.



Generating purely normal

data is

very
di
ffi
cult and costly in practice.


It is very

hard to guarantee that

there are no attacks during the time


the
tra
ffi
c is collected from the

network.

11

/29

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

-

Anomaly Detection
-

12
/29

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on


Use

a

mechanism

to

detect

intrusions

by

using

unlabeled

data

as

a

train

model
.



Fi
nd

intrusions

buried

within

that

data
.


M
isuse

D
etection



A
nomaly
D
etection
.

A

S
et of

U
nlabeled


D
ata


Unsupervised

Anomaly
Detection
Algorithm


Connection
Comparison
with Detected
Clusters


Detected Intrusion

Clusters


A
ssumptions

for unsupervised anomaly

detection algorithm
:

1.
The intrusions are rare with respect to normal network
traffic
.


2.
The intrusions are different from normal network traffic
.


As a

R
esult
:

The

i
ntrusions

will appear as outliers

in the data.

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

Detected
malicious
attacks

13

/29


The unsupervised anomaly

detection algorithm clusters

the unlabeled data instances

together into clusters using a

simple distance
-
based metric.


14

/29

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

Once data is clustered, all of the

instances that appear in

small clusters are labeled as

anomalies

because;


The normal instances should
form large clusters compared to
the intrusions,


Malicious intrusions and normal
instances are qualitatively
different, so they do not fall into
the same cluster.

15

/29

Normal cluster

Intrusion cluster

Methodology

1.
Description of the dataset

2.
Metric & Normalization

3.
Clustering Algorithm

a)
Portnoy et
.

a
l
.

b)
Y
-
means Algorithm

4.
Labeling Clusters

5.
Intrusion Detection


16

/29

Description of the dataset


KDD Cup 1999 Data


Main

attack
categories


DOS
:
Denial of Service, (e.g.
synood
)


R2L
:
Unauthorized access from a remote machine

(e.g. guessing password)


U2R
:
Unauthorized access to local
superuser

(root) privileges (e.g. various
bu
ff
er

over
f
low

attacks)


Probing
:
Surveillance and other probing (e.g. port

scanning)


In total, 24 attack types in training data; 14
additional ones in test data...

17
/29

Metric & Normalization



Euclidean Metric



(for distance computation)


Feature Normalization



(to eliminate the difference in the scale of features)

18
/29

Clustering Algorithm (Portnoy et. al.)

.

.

.

X
i

Training set

Empty set of clusters

d1

d2

d3

-

d1

is selected.

-

if
d1 < W

( predefined threshold value ),


then X
i

is assigned to that cluster.

-

else, a new cluster is created, then X
i

is assigned to it.

19
/29



Advantage
:

No need to know the initial no. of
clusters.


Disadvantage
:

Need to know
W
, which may label
instances wrong in some cases.


However


Clustering Algorithm (
Portnoy

et. al.)

20
/29

Clustering Algorithm (Y
-
means Algorithm)



3 main parts:

1.
assigning instances to k clusters

2.
splitting clusters

3.
merging clusters


21
/29


1.
assigning instances to
k

clusters

Dataset

k: no. of clusters

n: no. of instances

1 < k < n

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

redefine

cluster

centroid

Clustering Algorithm (Y
-
means Algorithm)

22
/29


2.
splitting clusters

.

Confident area

t

X
i

( instance )


.

d
i

t ( normal threshold) = 2.32 σ

σ = standard deviation



if d
i

> t , X
i

is an outlier.



New clusters are created firstly
with the farthest outliers.

Clustering Algorithm (Y
-
means Algorithm)

23
/29


3.
merging clusters

.

X
i

If X
i

is in the confident area of two clusters, merge these
clusters back.

Clustering Algorithm (Y
-
means Algorithm)

24
/29

Labeling Clusters


Our
first assumption:



# of
normal instances >
>

# of
intrusions


Label instances in large clusters: normal


Label instances in small clusters: intrusion


Start labeling as normal, until 99% of data is labeled
as normal, label rest of them as intrusion.

Normal cluster

Intrusion cluster

25
/29

Intrusion Detection

For test instance
x
,


Measure the distance to each cluster.


Select the nearest cluster
C
.


If
C

is normal cluster, label
x

as normal,


Otherwise label
x

as intrusion.

26
/29

Overall Summary


IDS & IDS Technologies


Using Clustering for Intrusion Detection


Methodology

1.
Description of the dataset

2.
Metric & Normalization

3.
Clustering Algorithm

4.
Labeling Clusters

5.
Intrusion Detection

Conclusion


Unsupervised Clustering is choosen.


KDD Cup 1999 Data


Y
-
means Algorithm is used for creating ID System.


27
/29

References

[1] K
DD

C
up 1999 data.

http://kdd.ics.uci.edu/
databases/kddcup99/kddcup99.html.

[2] Y. Guan and A. A. Ghorbani. Y
-
means: A clustering

method for
intrusion detection. In Proceedings of

Canadian Conference
on Electrical and Computer

Engineering, pages 1083{1086,
2003.

[3] L. Portnoy, E. Eskin, and S. Stolfo. Intrusion detection

with
unlabeled data using clustering. In Proceedings of

ACM CSS
Workshop on Data Mining Applied to

Security (DMSA
-
2001),
2001.

[4] K. Scarfone and P. Mell. Guide to intrusion detection

and
prevention systems (idps), 2007.

28
/29





Questions?

29
/29