Craig's slides [pdf] - University of Illinois at Urbana-Champaign

runmidgeAI and Robotics

Oct 20, 2013 (3 years and 7 months ago)

66 views

Applied Anomaly
Based IDS

Craig Buchanan

University of Illinois at Urbana
-
Champaign

CS 598 MCC

4/30/13

Outline


K
-
Nearest Neighbor


Neural Networks


Support Vector Machines


Lightweight Network Intrusion Detection (LNID)

K
-
Nearest Neighbor


“Use of K
-
Nearest Neighbor classifier for intrusion detection”
[Liao, Computers and Security]

K
-
nearest neighbor on text

1.
Categorize training documents into vector space model, A


Word
-
by
-
document matrix A


Rows = words


Columns = documents


Represents weight of each word in set of
documents

2.
Build vector for test document, X

3.
Classify X into A using K
-
nearest neighbor

Text categorization


Create vector space model A


𝑎




weight of word
i

in document j



Useful variables


N


number of documents in the collection


M


number of distinct words in the
collection







frequency of word
i

in document j







total number of times word
i

in the collection

Text categorization


Frequency weighting


𝑎

=





Term frequency


inverse document frequency (
tf
*
idf
)


𝑎

=




𝑙
2
𝑀
𝑙
=
1
×
log
𝑁
𝑛


Text categorization


System call = “word”


Program execution = “document”



Close,
execve
, open,
mmap
, open,
mmap
,
munmap
,
mmap
,
mmap
, close, …, exit

Document Classification


Distance measured by Euclidean distance


𝑖
𝑋
,
𝐷

=

𝑥

×


𝑡


(
𝑋

𝐷

)
𝑋
2
×
𝐷

2


𝑋



test document


𝐷




jth

training document







word shared by
𝑋

and
𝐷



𝑥




weight of word



in
𝑋


𝐷




weight of word



in
𝐷


Anomaly detection


If X has unknown system call then
abnormal


If X is the same as any
Dj

then
normal



K
-
nearest neighbor


Calculate
sim_avg

for k
-
nearest neighbors


If
sim_avg

> threshold then
normal


Else
abnormal

Results

Results

Neural Networks


Intrusion Detection with Neural Networks [Ryan, AAAI
Technical Report 1997]



Learn user profiles (“prints”) to detect intrusion

NNID System

1.
Collect training data


Audit logs from each user

2.
Train the neural network

3.
Obtain new command distribution vector

4.
Compare to training data


Anomaly if:


Associated with a different user


Not clearly associated with any user



Collect training data


Type of data


as,
awk
,
bc
,
bibtex
, calendar, cat,
chmod
,
comsat
,
cp
,
cpp
, cut,
cvs
, date,
df
, diff, du,
dvips
,
egrep
, elm,
emacs
, …, w,
wc
,
whereis
,
xbiff
++,
xcalc
,
xdvi
,
xhost
,
xterm


Type of platform


Audit trail logging


Small number of users


Not a large target

Train Neural Network


Map frequency of command to nonlinear scale


0.0 to 1.0 in 0.1 increments


0.0


never used


0.1


used once or twice


1.0


used > 500x


Concatenate values to 100
-
dimensional command distribution
vector

Neural Network


3
-
layer
backpropagation

architecture

Input

(x100)

Hidden

(x30)

Output

(x10)

Results

Results


Rejected 63% random user vectors


Anomaly detection rate 96%



Correctly identified user 93%


False alarm rate 7%

Support Vector Machines


Intrusion Detection Using Neural Networks and Support
Vector Machines [
Mukkamala
, IEEE 2002]


SVM IDS

1.
Preprocess randomly selected raw TCP/IP traffic

2.
Train SVM


41 input features


1


normal


-
1


attack

3.
Classify new traffic as normal or anomaly

SVM IDS Features

Feature name

Description

Type

Duration

Length

of the connection

Continuous

Protocol type

TCP, UDP, etc.

Discrete

Service

HTTP,

TELNET, etc.

Discrete

Src_bytes

Number of data bytes
from source

to
destination

Continuous

Dst_bytes

Number of data bytes to
source from destination

Continuous

Flag

Normal or error status

Discrete

Land

If

connection is from/to
the same host/port

Discrete

Wrong_fragment

Number of “wrong”
晲慧浥湴s

䍯C瑩湵潵o







Results

-1.5
-1
-0.5
0
0.5
1
1.5
SVM prediction
Actual
Recent Anomaly
-
based IDS


An efficient network intrusion detection [Chen, Computer
Communications 2010]



Lightweight Network Intrusion Detection (LNID) system

LNID Approach


Detect R2L and U2R



Assume attack is in first few packets


Calculate anomaly score of packets

LNID System Architecture

Anomaly Score


Based on Mahoney’s network IDS [21
-
24]


M.V. Mahoney, P.K. Chan, PHAD: packet header anomaly
detection for identifying hostile network traffic, Florida Institute
of Technology Technical Report CS
-
2001
-
04, 2001.


M.V. Mahoney, P.K. Chan, Learning
nonstationary

models of
normal network traffic for detecting novel attacks, in:
Proceedings of the 8
th

ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 2002a, pp. 276
-
385.


M.V. Mahoney, P.K. Chan, Learning models of network traffic for
detecting novel attacks, Florida Institute of Technology Technical
Report CS
-
2002
-
08, 2002b.


M.V. Mahoney, Network traffic anomaly detection based on
packet bytes, in: Proceedings of the 2003 ACM Symposium on
Applied Computing, 2003, pp. 346
-
350.

Anomaly Score (Mahoney)


𝑐


𝑟  
=
𝑛
𝑟



𝑐


𝑝 
=



𝑛

𝑟





𝑟  





= time elapsed since last time attribute was anomalous




= number of training or observed instances




= number of novel values of attribute


Anomaly Score (revised)


𝑐


𝑟  
=
1

𝑟
𝑛



𝑐


𝑝 
=

(
1

𝑟

𝑛

)




𝑟  





= number of training or observed instances




= number of novel values of attribute


Anomaly Scoring Comparison

Attributes


Attribute = packet byte


256 possible values


48 attributes (packet bytes)


20 bytes of IP header


20 bytes of TCP header


8 bytes of payload

Results


Detection rate







Workload


LNID


0.3% of traffic


NETAD


3.16% of traffic


Lee et. al.


100% of traffic

Total (%)

U2R (%)

R2L (%)

#

FA/Day

LNID

73

70

77

2

NETAD

68

55

78

10

Lee et.

al.

78

18

10

Results


Hard detected attacks

Attack name

Description

LNID

PHAD

DARPA

loadmodule

U2R, SunOS, set IFS to call
trojan

suid

program

1/3

0/3

1/3

ncftp

R2L, FTP exploit

4/5

0/5

0/5

sechole

U2R, NT bug exploit

3/3

1/3

1/3

perl

U2R, Linux exploit

2/3

0/3

0/4

sqlattack

U2R,
excape

from SQL
database shell

3/3

0/3

0/3

xterm

U2R, Linux buffer overflow
in
suid

root

prog
.

3/3

0/3

1/3

Detection

rate

16/20

(80%)

1/20

(5%)

3/21

(14%)

Questions or Comments