CLASSIFICATION OF RELEVANT AND REDUNDANT INTRUSION DETECTION DATA USING MACHINE LEARNING APPROACHES

cobblerbeggarAI and Robotics

Oct 15, 2013 (3 years and 9 months ago)

67 views

JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
COMPUTER
SCIENCE AND APPLICATIONS

ISSN: 0975


6728|
NOV 1
2
TO OCT 1
3

|
VOLUME


02, ISSUE
-

0
2



Page
103

CLASSIFICATION OF RE
LEVANT AND
REDUNDANT INTRUSION
DETECTION DATA USING

MACHINE LEARNING APP
ROACHES


1

Ms
.
J
. R.
PATEL


1

Asst. Professor, Depart
ment

o
f Computer Science, Veer Narmad South Gujarat
University, Surat


jayshri.r@gmail.com

ABSTRACT
:


The de
velopment of data
-
mining applications such as classification and clustering has shown the need for
machine learning algorithms to be applied to intrusion detection data. In this paper we present the different
classification techniques for classifying intru
sion detection data. The aim of this paper is to investigate the
performance of different classification methods for relevant and redundant intrusion detection data. The
Correlation based feature selection method is used to select relevant and redundant fe
atures from intrusion
detection data. The classification algorithms tested are Decision Tree, Naïve Bayes, OneR, P
artial Decision tree

and Nearest Neighbors Algorithm.


Keywords


Intrusion

detection, Correlation Based Feature Selection, Naïve bayes, Dec
ision Tree, Nearest
Neighbor, OneR, P
artial Decision Tree


I:

INTRODUCTION

The Internet, along with its core benefit, also provides
numerous snaps to violate the stability and security of
the systems connected to it. Although static defense
mechanisms su
ch as firewalls, software updates etc.
can provide a reasonable level of security, more
dynamic mechanisms such as IDS are also suggested
for better security. Intrusion Detection is defined as a
set of activities that attempt to distinguish the intrusive
a
nd normal activities.

Intrusion detection is classified as host based or
network based. A host based IDS will monitor
resources such as system logs, file systems and disk
resources; whereas a network based IDS monitors the
data passing through the network
.

The network intrusion detection has raw network
traffic which should be summarized into higher
-
level
objects such as connection records or audit record. The
audit record capture various features of the network
connections like duration, protocol type, s
ource and
destination bytes of a TCP connection. Not all the
features of intrusion detection dataset are useful for
classification. Effective feature selection for intrusion
detection identifies some of the important features for
detecting anomalous networ
k connections. Feature
selection reduces memory requirement and increases
the speed of execution thereby increases the overall
performance. We have used the Correlation based
feature selection method which will select 10+1 (class
label) features among 42
features for classification. We
apply Decision Tree, Naïve Bayes, OneR, P
artial
Decision Tree (P
ART
)

and
K
-
Nearest Neighbors
classification algorithm.





II:

RELATED

WORK


In [1], Stein et al. uses Decision Tree classifier for
Intrusion detection with GA
based feature selection to
improve the classification abilities of the decision tree
classifier. They use a genetic algorithm to select a
subset of input features for decision tree classifiers to
increase the detection rate and decrease the false alarm
rat
e in network intrusion detection. In [2], Mukkhmala
et al. uses decision tree and Support Vector Machine
(SVM) to model IDS. They compare the performance
of SVM and Decision tree and found that Decision tree
gives better overall performance than the SVM. I
n [3],
Lee et al. has provided a data mining framework for
constructing intrusion detection models. They compute
activity patterns from system audit data and extracts
predictive features from the patterns. They apply then
machine learning algorithms to the

audit records that
are processed according to the feature definitions to
generate intrusion detection rules. They extend the
basic association rules and frequent episodes
algorithms to in analyzing audit data.


III: C
ORRELATION BASED FEATURE
SELECTION TO

INTRUSION
DETECTION DATA


As discussed by Hall in [4], C
orrelation Based
F
eature
S
election (CFS)

evaluates the worth of a subset of
attributes by considering the individual predictive
ability of each feature along with the degree of
redundancy between th
em. It gives high scores to
subsets that include features that are highly correlated
to the class attribute but have low correlation to each
other. Correlation coefficient is used to estimate
correlation between subset of attributes and class, as
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
COMPUTER
SCIENCE AND APPLICATIONS

ISSN: 0975


6728|
NOV 1
2
TO OCT 1
3

|
VOLUME


02, ISSUE
-

0
2



Page
104

well as
inter
-
correlations between the features.
Relevance of a group of features grows with the
correlation between features and classes, and decreases
with growing inter
-
correlation. CFS is used to
determine the best feature subset. Equation for CFS is
given is
equation 1.



(1)

Where r
zc

is the correlation between the summed
feature subsets and the class variable, k is the number
of subset features, r
zi

is the average of the
correlations
between the subset features and the class variable, and
r
ii

is the average inter
-
correlation between subset
features. The CFS method takes the subset evaluation
approach, which handles feature redundancy with
feature relevance.


I
V
:

MACHINE L
EARNING APPROACHES FOR
THE CLASSIFICATION OF INTRUSION
DETECTION

Intrusion detection can be considered as classification
problem where each connection record is identified as
normal or intrusive based on some existing data.
Classification for intrusion de
tection is an important
challenge because it is very difficult to detect several
new attacks, as the attackers are continuously changing
their attack patterns. Several machine learning
approaches are used for the classification of intrusion
detection data.

The classification algorithms are
discussed in the following section.

1)

DECISION TREE ALGORITHM

A decision tree is a tree structure comprising a set of
conditions organized in a hierarchical structure. It is
consisting of internal and external nodes con
nected by
branches. An internal node is a decision making until
that evaluates a decision function to determine which
child node to visit next. The external node or leaf node
has no child nodes and is associated with a class label.
A decision tree can easi
ly be converted to a set of
classification rules. Many decision tree construction
algorithms involve a two
-
step process. First, a decision
tree is constructed and then the given tree is pruned.
The pruned decision tree that is used for classification
purpo
ses is called the classification tree. [5]

2)

NAÏVE BAYES
ALGORITHM

Bayesian classifiers are statistical classifiers and are
based on Bayes’ theorem. Naïve Bayesian classifiers
assume that the effect of an attribute value given on a
given class is indepen
dent of the values of the other
attributes [5]. Using Naïve Bayes for intrusion
detection, we can calculate the probability that an
attack is occurring based on some data by first
calculating the probability that some previous data was
part of that type of

attack and then multiplying by the
probability of that type of attack occurring.

3) OneR
ALGORITHM


OneR generates a one
-
level decision tree expressed in
the form of a set of rules that all test one particular
attribute. OneR is a simple, cheap method th
at often
comes up with quite good rules for characterizing the
structure in data. Sometimes the simple rules
frequently achieve surprisingly high accuracy. The
Pseudocode for OneR is as follows [6]:

For each attribute,

For each value of that attribute, ma
ke a rule as follows:

count how often each class appears

find the most frequent class

make the rule assign that class to this attribute
-
value.

Calculate the error rate of the rules.

Choose the rules with the smallest error rate.

Fig.1 Pseudocode for 1R.


4
) PARTIAL
DECISION TREE ALGORITHM

PART combines the divide
-
and
-
conquer strategy for
decision tree learning with the separate
-
and
-
conquer
one for rule learning. It adopts the separate
-
and
-
conquer strategy in that it builds a rule, removes the
instances it
covers, and continues creating rules
recursively for the remaining instances until none are
left. To generate such a tree, the construction and
pruning operations are integrated in order to find a
“stable” subtree that can be simplified no further. Once
th
is subtree has been found, tree building ceases and a
single rule is read off. The tree
-
building algorithm

[6]

is summarized in Fig
.

2
:

Expand
-
subset (S):

Choose a test T and use it to split the set of examples
into subsets

Sort subsets into increasing or
der of average entropy

while (there is a subset X that has not yet been
expanded

AND all subsets expanded so far are leaves)

expand
-
subset(X)

if (all the subsets expanded are leaves

AND estimated error for subtree
node)

undo expansion into subsets and make node a leaf

Fig. 2 Algorithm for expanding examples into a partial
tree.


5) K
-
NEAREST NEIGHBOR
ALGORITHM

Nearest neighbor classifiers is based on learning by
analogy, that is, by comparing
a given test tuple with
training tuples (which are described by n attributes)
that are similar to it. Each tuple represents a point in an
n
-
dimensional space. When given an unknown tuple, a
K
-
Nearest neighbor classifier searches the pattern
space for the k

training tuples that are closet to the
unknown tuple. These k
-
training tuples are the k
“nearest neighbors” of the unknown tuple. The
unknown tuple is assigned the most common class
among its k nearest neighbors. “Closeness” is defined
in terms of a dista
nce metric, such as Euclidean
distance metric

[5]
.

V: EXPERIMENTS AND RESULTS

The data for the experiments were prepared by the
KDDCUP 1999 DARPA intrusion detection evaluation
program by MIT Lincoln Laboratory. As given in [7],
the data set contains 4 ma
in attack categories namely
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
COMPUTER
SCIENCE AND APPLICATIONS

ISSN: 0975


6728|
NOV 1
2
TO OCT 1
3

|
VOLUME


02, ISSUE
-

0
2



Page
105

Denial of Service (DoS), Remote to User (R2L), User
to Root (U2R) and Probing. It includes total 24
different attacks types among the 4 main categories.
The original data set has 41 attributes for each
connection record plus one

class label. Example of
features is protocol type, duration of each connection
etc.

For performing the experiments I have used the open
source package WEKA taken from [8]. I have first
preprocessed the original dataset by creating only 4
main attack cate
gories instead on 24 different attack
types. Then CFS feature selection method with
BestFirst is applied to the preprocessed dataset. I have
generated preprocessed dataset with selected 10
features. Using this preprocessed data set, the Decision
Tree (J48)
, Naïve Bayes, OneR, PART and K
-
Nearest
Neighbor classification algorithm constructs a model
using 10 fold cross validation. The result of the
classification is given in the figure 3, 4
,
and

5. From
the below figures it is clearly said that PART and J48
cl
assifiers gives the highest accuracy of 99.96% and
99.95% respectively. The lowest accuracy is provided
by naïve bayes which is of 93.93%. The OneR and
PART also provides 99.05% and 99.87% accuracy.
The Kappa stastics measure also shows the value
which is
very nearer to one. For intrusion detection
classification if we provide the relevant and redundant
features to the various classifiers it provides the
considerable efficient results.


C
LASSI
-

FIER

C
ORRECTLY
CLASSIFIED
INSTANCES
(%)

I
NCORRECTLY
CLASSIFI
ED
INSTANCES

(
%)

K
APPA
S
TATIST
-

ICS

J48

99.95

0.0420

0.969

N
AÏVE
B
AYES

93.93

6.0684

0.8805

O
NE
R

99.05

0.9472

0.9805

PART

99.96

0.0378

0.9992

KNN

99.87

0.1277

0.9974

Fig.
3

Results of various classifiers

Fi
g.
4

Comparison of
various classifiers


Fig
.
5

Comparison of
various classifiers


V
I
:

C
ONCLUSION

From the above experiment & result analysis it is very
clear
that to evaluate and investigate five selected
classification algorithms for relevant and redundant
intrusion detection data. The best algori
thm based on
the pre
-
processed intrusion detection data is PART
with an accuracy of 99.96%. These results suggest that
among the machine learning algorithm tested, PART
and Decision tree classifier has the potential to
significantly improve classification
results for intrusion
detection.


REFE
RE
NCES

[1]

Gary Stein, Bing Chen, Annie S. Wu, Kien A. Hua
“Decision tree classifier for network intrusion
detection with GA based feature selection”

ACM
-
SE
43: Proceedings of the 43rd annual Southeast regional
confer
ence
-

Vol. 2, 136
-
141, March [2005].

[2] Mukkamala S., Sung A.H. and Abraham A.,
“Intrusion Detection Using Ensemble of Soft
Computing Paradigms”,
Third International
Conference on Intelligent Systems Design and
Applications, Springer Verlag Germany
, 239
-
2
48,
[2003].

[3] Wenke Lee and Salvatore J.Stolfo, "A Framework
for constructing features and models for intrusion
detection systems”, ACM
transactions on Information
and system security (TISSEC
), vol.3, November 227
-
261, [2000].

[4] M. A. Hall “Correlation
-
based feature selection for
discrete and numeric class machine learning.”
In
Proceedings of the Seventeenth International
Conference on Machine Learning
, 359

366, [2000].

[5] Jiawei Han,Micheline Kamber, “
Data Mining:
Concepts and Techniques
”, 2nd Edition
, Morgan

Kaufmann [2006]

[6]
Ian H. Witten and Eibe Frank
. “
Data Mining:
Practical machine learning tools and techniques”.

2nd
Edition, Morgan Kaufmann [2005]

[7]
KDD cup 99,
http://kdd.ics.uci.edu/database/kddcup99/kddcup.data
10 percent.gz
.

[8]
http://www.cs.waikato.ac.nz/~ml/weka
.