INTRUSION DETECTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE

muscleblouseAI and Robotics

Oct 19, 2013 (3 years and 9 months ago)

73 views

INTRUSION DETECTION USING
NEURAL NETWORKS AND
SUPPORT VECTOR MACHINE

Srinivas

Mukkamala
, Guadalupe
Janoski
, Andrew Sung

Dept. of CS in New Mexico Institute of Mining and Technology

1

IEEE WCCI IJCNN 2002

World Congress on Computational Intelligence

International Joint Conference on Neural Networks

Outline


Approaches to intrusion detection using neural
networks and support vector machines


DARPA dataset


Neural Networks


Support Vector Machines


Experiments


Conclusion and Comments

2

Approaches


Key ideas are to


discover useful patterns or features that describe user
behavior on a system


And use the set of relevant features to build classifiers that
can recognize anomalies and known intrusions



Neural networks and support vector machines are
trained with normal user activity and attack patterns


Significant deviations from normal behavior are flagged as
attacks

3

DARPA Data for Intrusion Detection


DARPA (Defense Advanced Research Projects Agency)


An agency of US Department of Defense responsible for the
development of new technology for use by the military


Benchmark from a KDD (Knowledge Discovery and Data
Mining) competition designed by DARPA


Attacks fall into four main categories


DOS: denial of service


R2L: unauthorized access from a remote machine


U2R: unauthorized access to local super user (root) privileges


Probing: surveillance and other probing

4

Features

5

http://kdd.ics.uci.edu/databases/kddcup99/task.html

Neuron

神經

Dendrite

樹突

Axon

軸突

Soma

中心

Gather signals

Output signal

Combine signals


& decide to trigger

Neural Networks

6

OUTPUT

X
1

X
2

平面的線
:

w
1
X
1

+ w
2
X
2



θ

= 0

w
1

w
2

θ

A

B

C

D

INPUT

WEIGHT

ACTIVATION

Divide and Conquer

N
1

N
2

N
3

1

1

1

1

1

1

-
1

-
1

-
1

Data

N
1

N
2

A

+1

+1

+1

-
3

B

+1

-
1

-
1

-
1

C

-
1

-
1

-
3

+1

D

-
1

+1

-
1

-
1

N
3

A

+1

-
1

+1

B

-
1

-
1

-
1

C

-
1

+1

+1

D

-
1

-
1

-
1

-
1

-
1

-
1

x
1

x
1

x
2

x
2

out
1

out
2

out
3

Σ

Σ

Σ

Σ

7

1

2

Layer 1

Layer 2

Layer 3

Layer 4

w
0
1
(1)

w
1
1
(1)

w
2
1
(1)

Σ

Layer 1

N
1

S
1
(1)

x
1
(1)

x
0
(0)

x
1
(0)

x
2
(0)

w
i
j
(
l
)

Σ

Layer
l

N
j

S
j
(
l
)

x
j
(
l
)

x
i
(
l
-
1)

Hyperbolic

function

tanh
(S) =

e
S



e
-
S

e
S

+ e
-
S

S

tanh
(S)

Decide Architecture

Determine Weight


Automatically

Feed Forward Neural Network (FFNN)

8

Σ

Σ

Σ

Input

Output

g(x)


w
所組成的
classifier

w

w

w

w

w

w

w

w

w

Training Data:

Error Function:

How to minimize E(w) ?



S瑯t桡獴楣 䝲慤楥湴a䑥獣敮琠⡓䝄(

w

E

w is random small value at the beginning

for T iterations


w
new



w
old



η


w
(En)

learning rate

9

……

Layer 1

Layer 2

Layer L
-
1

Layer L

… …

w
i
j
(
l
)

Σ

Layer
l

N
j

S
j
(
l
)

x
1
(
l
)

x
i
(
l
-
1)

forward

for
l

= 1, 2, …, L


compute
S
j
(
l
)

and
x
j
(
l
)

Back Propagation Algorithm

General

backward

for
l

= L, L
-
1, …, 1


compute
δ
i
(
l
)

10

Σ

Σ

Σ

w

w

w

w

w

w

w

w

w

… …

w
i
j
(
l
)

Σ

Layer
l

N
j

S
j
(
l
)

x
1
(
l
)

x
i
(
l
-
1)

Feed Forward
NNet


Consists of layers


1, 2, …, L

w
ij
(
l
)
connect neuron
i

in layer (
l
-
1)





to neuron j in layer
l

Cumulated signal

Activated output

often
tanh

Minimize E(w) and determine the weights automatically

SGD (Stochastic Gradient Descent)

Forward:

compute
S
j
(
l
)

and
x
j
(
l
)

Backward:

compute
δ
i
(
l
)

w is random small value at the beginning

for T iterations


w
new



w
old



η


w
(En)

11

Stop when desired error rate was met

Support Vector Machine

12


A supervised learning method


Is known as the maximum margin classifier


Find the max
-
margin separating hyperplane

SVM


hard margin

13

x1

x2

2


w


<w, x>
-

θ

= 0

<w, x>
-

θ

=
-
1

<w, x>
-

θ

= +1

max

2


w


w,
θ

y
n
(<w, x
n
>
-

θ
)

1

argmin

2

w,
θ

y
n
(<w, x
n
>
-

θ
)

1

1

<w, w>

Quadratic programming

14

argmin

1

Σ

Σ

a
ij
v
i
v
j

+
Σ

b
i
v
i

2

i

j

Σ

r
ki
v
i



q
k

i

v

V*


煵慤灲潧⡁ⰠbⰠ刬ⁱ,

argmin

2

w,
θ

y
n
(<w, x
n
>
-

θ
)

1

1

㱷Ⱐ眾

䱥L ⁖‽ 嬠
θ
, w
1
, w
2
, …, w
D

]

Σ

w
d
2

2

1

d=1

D

(
-
y
n
)

θ

+
Σ

y
n

(x
n
)
d

w
d



1

d=1

D


Adapt the problem for
quadratic programming


Find A, b, R, q and put into
the quad. solver


Adaptation

15

V = [
θ
, w
1
, w
2
, …, w
D

]

v
0
, v
1
, v
2
, .…, v
D

Σ

w
d
2

2

1

d=1

D

(
-
y
n
)

θ

+
Σ

y
n

(x
n
)
d

w
d



1

d=1

D

v
0

v
d

argmin

1

Σ

Σ

a
ij
v
i
v
j

+
Σ

b
i
v
i

2

i

j

Σ

r
ki
v
i



q
k

i

v

a
00

= 0

a
0j

= 0

a
i0

= 0


i ≠ 0, j ≠ 0

a
ij

= 1
(i = j)


0
(i ≠ j)

b
0

= 0


i ≠ 0

b
i

= 0

q
n

= 1

r
n0

=
-
y
n


d > 0

r
nd

= y
n

(x
n
)
d

(
1+D)*(1+D)

(1+D)*1

(2N)*(1+D)

(2N)*1

SVM


soft margin


Allow possible training errors







Tradeoff c


Large c : thinner hyperplane, care about error


Small c : thicker hyperplane, not care about error

16

argmin

2

w,
θ

y
n
(<w, x
n
>
-

θ
)




ξ
n

1

<w, w> +
c

Σ
ξ
n

n

ξ
n


0

errors

tradeoff

Adaptation

17

argmin

1

Σ

Σ

a
ij
v
i
v
j

+
Σ

b
i
v
i

2

i

j

Σ

r
ki
v
i



q
k

i

v

V = [
θ
, w
1
, w
2
, …, w
D
,
ξ
1
,
ξ
2
, …,
ξ
N

]

(1+D+N)*(1+D+N)

(2N)*(1+D+N)

(1+D+N)*1

(2N)*1

Primal form and Dual form


Primal form

18


Dual form

argmin

2

w,
θ

y
n
(<w, x
n
>
-

θ
)




ξ
n

1

<w, w> +
c

Σ
ξ
n

n

ξ
n


0

argmin

2

α

0

α
n

C

1

ΣΣ

α
n
y
n
α
m
y
m
<x
n
, x
m
>

-

Σ

α
n

n m

Σ

y
n
α
n

=
0

n

n

Variables: 1+D+N

Constraints: 2N

Variables: N

Constraints: 2N+1

Dual form SVM


Find optimal
α
*


Use
α
* solve w* and
θ




α
n
=0


correct or on


0<
α
n
<C


on


α
n
=C


wrong or on


19

α
n
=C

free SV

α
n
=0

Support Vector

Nonlinear SVM


Nonlinear mapping X


Φ
(X)


{(x)
1
, (x)
2
} R
2



{1, (x)
1
, (x)
2
, (x)
1
2
, (x)
2
2
, (x)
1
(x)
2
} R
6


Need kernel trick

20

argmin

2

α

0

α
n

C

1

ΣΣ

α
n
y
n
α
m
y
m
<
Φ
(
x
n
),
Φ
(
x
m
)>

-

Σ

α
n

n m

Σ

y
n
α
n

=
0

n

n

(1+ <x
n
, x
m
>)
2

Experiments

21

Using automated parsers
to process the raw TCP/IP
dump data into machine
-
readable form

7312 training data (different
types of attacks and normal
data) has 41 features

6980 testing data
evaluate the classifier

Pre
-
processing

Training

Testing

Support Vector Machines

Neural

Networks

Details

RBF

kernel

C = 1000

204 support vectors (29 free)

3
-
layer 41
-
40
-
40
-
1
FFNNets

Scaled

conjugate gradient descent

Desired error rate = 0.001

Accuracy

99.5%

99.25%

Time spent

17.77 sec

18 min

Conclusion and Comments

22


Speed


SVMs is significant shorter


Avoid the ”curse of dimensionality” by max
-
margin


Accuracy


Both have high accuracy


SVMs can only make binary classification


IDS requires multiple
-
class identification



How to determine the features