Unsupervised Intrusion Detection

savagelizardAI and Robotics

Nov 25, 2013 (3 years and 11 months ago)

161 views

Unsupervised Intrusion Detection
Using Clustering

Approach

Muhammet Kabukçu

Sefa Kılıç

Ferhat Kutlu

Teoman Toraman

1
/29

Outline



Introduct
i
on


U
s
i
ng

C
luster
i
ng

for
I
ntrus
i
on

D
etect
i
on


Methodology


Overall Summary


Conclusion


References


2
/29

Introduct
i
on


Incidents are violations or
imminent threats of violation of:


*

computer security

policies,



*
acceptable use policies,


*
standard security practices.


Intrusion detection is the process of monitoring the events

occurring in a computer system or network and analyzing

them for signs of possible incidents.

3
/29

Introduct
i
on


An intrusion detection
system (IDS) is software
that automates the
intrusion detection
process.




IDSs are primarily

focuses on identifying possible
incidents and detecting when

an attacker has
successfully compromised a system by exploiting
vulnerability in the
syste
m.


4

/29

Methodologies

of
IDS
T
echnologies

Signature
-
Based
Detection

Anomaly
-
Based
Detection

Stateful

Protocol
Analysis

Introduct
i
on

5

/29

Signature
-
Based Detection


A signature is a pattern that corresponds to a known
threat (e.g. a telnet attempt with a username of "root",
which is a violation of an organization's security policy).




Signature
-
based detection is the process of comparing
signatures against observed events to identify possible
incidents.


Advantage
: Very effective at detecting
known threats
.


Disadvantage
:

Ineffective at detecting
previously



unknown threats
.

6

/29

Anomaly
-
Based Detection




The process of comparing definitions of what activity is
considered normal against observed events to identify
significant deviations.


Capable of detecting
previously unknown threats
.


Uses host or network
-
specific profiles.

7

/29

Detection

by
Stateful

Protocol Analysis


T
he process of comparing predetermined pro
fi
les of
generally accepted de
fi
nitions of benign protocol activity
for each protocol

state against observed events to
identify deviations.



R
elies on vendor
-
developed universal pro
fi
les that
specify how particular protocols should and should not
be

used.

8

/29

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on


Methods

other

than

Signature
-
Based

Detection

use

data

mining

and

machine

learning

algorithms

to

train

on

labeled

network

data
.



For training data, there are two major paradigms
:


M
isuse

D
etection




A
nomaly
D
etection
.

9

/29

Which one to use ???

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

-

Misuse Detection
-


In
misuse detection
, machine learning algorithms
are

used with labeled data.


By

using the extracted features from labeled
network

tra
ffi
c,

network data is classi
fi
ed
.


By using new data which includes new type of
attacks,

detection models
are
retrained
.

10

/29


In
anomaly detection
,

models are built by training

on
normal data
,

deviations are searched over the normal

model
.



Generating purely normal

data is

very
di
ffi
cult and costly in practice.


It is very

hard to guarantee that

there are no attacks during the time


the
tra
ffi
c is collected from the

network.

11

/29

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

-

Anomaly Detection
-

12
/29

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on


Use

a

mechanism

to

detect

intrusions

by

using

unlabeled

data

as

a

train

model
.



Fi
nd

intrusions

buried

within

that

data
.


M
isuse

D
etection



A
nomaly
D
etection
.

A

S
et of

U
nlabeled


D
ata


Unsupervised

Anomaly
Detection
Algorithm


Connection
Comparison
with Detected
Clusters


Detected Intrusion

Clusters


A
ssumptions

for unsupervised anomaly

detection algorithm
:

1.
The intrusions are rare with respect to normal network
traffic
.


2.
The intrusions are different from normal network traffic
.


As a

R
esult
:

The

i
ntrusions

will appear as outliers

in the data.

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

Detected
malicious
attacks

13

/29


The unsupervised anomaly

detection algorithm clusters

the unlabeled data instances

together into clusters using a

simple distance
-
based metric.


14

/29

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

U
s
i
ng
C
luster
i
ng for
I
ntrus
i
on

D
etect
i
on

Once data is clustered, all of the

instances that appear in

small clusters are labeled as

anomalies

because;


The normal instances should
form large clusters compared to
the intrusions,


Malicious intrusions and normal
instances are qualitatively
different, so they do not fall into
the same cluster.

15

/29

Normal cluster

Intrusion cluster

Methodology

1.
Description of the dataset

2.
Metric & Normalization

3.
Clustering Algorithm

a)
Portnoy et
.

a
l
.

b)
Y
-
means Algorithm

4.
Labeling Clusters

5.
Intrusion Detection


16

/29

Description of the dataset


KDD Cup 1999 Data


Main

attack
categories


DOS
:
Denial of Service, (e.g.
synood
)


R2L
:
Unauthorized access from a remote machine

(e.g. guessing password)


U2R
:
Unauthorized access to local
superuser

(root) privileges (e.g. various
bu
ff
er

over
f
low

attacks)


Probing
:
Surveillance and other probing (e.g. port

scanning)


In total, 24 attack types in training data; 14
additional ones in test data...

17
/29

Metric & Normalization



Euclidean Metric



(for distance computation)


Feature Normalization



(to eliminate the difference in the scale of features)

18
/29

Clustering Algorithm (Portnoy et. al.)

.

.

.

X
i

Training set

Empty set of clusters

d1

d2

d3

-

d1

is selected.

-

if
d1 < W

( predefined threshold value ),


then X
i

is assigned to that cluster.

-

else, a new cluster is created, then X
i

is assigned to it.

19
/29



Advantage
:

No need to know the initial no. of
clusters.


Disadvantage
:

Need to know
W
, which may label
instances wrong in some cases.


However


Clustering Algorithm (
Portnoy

et. al.)

20
/29

Clustering Algorithm (Y
-
means Algorithm)



3 main parts:

1.
assigning instances to k clusters

2.
splitting clusters

3.
merging clusters


21
/29


1.
assigning instances to
k

clusters

Dataset

k: no. of clusters

n: no. of instances

1 < k < n

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

redefine

cluster

centroid

Clustering Algorithm (Y
-
means Algorithm)

22
/29


2.
splitting clusters

.

Confident area

t

X
i

( instance )


.

d
i

t ( normal threshold) = 2.32 σ

σ = standard deviation



if d
i

> t , X
i

is an outlier.



New clusters are created firstly
with the farthest outliers.

Clustering Algorithm (Y
-
means Algorithm)

23
/29


3.
merging clusters

.

X
i

If X
i

is in the confident area of two clusters, merge these
clusters back.

Clustering Algorithm (Y
-
means Algorithm)

24
/29

Labeling Clusters


Our
first assumption:



# of
normal instances >
>

# of
intrusions


Label instances in large clusters: normal


Label instances in small clusters: intrusion


Start labeling as normal, until 99% of data is labeled
as normal, label rest of them as intrusion.

Normal cluster

Intrusion cluster

25
/29

Intrusion Detection

For test instance
x
,


Measure the distance to each cluster.


Select the nearest cluster
C
.


If
C

is normal cluster, label
x

as normal,


Otherwise label
x

as intrusion.

26
/29

Overall Summary


IDS & IDS Technologies


Using Clustering for Intrusion Detection


Methodology

1.
Description of the dataset

2.
Metric & Normalization

3.
Clustering Algorithm

4.
Labeling Clusters

5.
Intrusion Detection

Conclusion


Unsupervised Clustering is choosen.


KDD Cup 1999 Data


Y
-
means Algorithm is used for creating ID System.


27
/29

References

[1] K
DD

C
up 1999 data.

http://kdd.ics.uci.edu/
databases/kddcup99/kddcup99.html.

[2] Y. Guan and A. A. Ghorbani. Y
-
means: A clustering

method for
intrusion detection. In Proceedings of

Canadian Conference
on Electrical and Computer

Engineering, pages 1083{1086,
2003.

[3] L. Portnoy, E. Eskin, and S. Stolfo. Intrusion detection

with
unlabeled data using clustering. In Proceedings of

ACM CSS
Workshop on Data Mining Applied to

Security (DMSA
-
2001),
2001.

[4] K. Scarfone and P. Mell. Guide to intrusion detection

and
prevention systems (idps), 2007.

28
/29





Questions?

29
/29