SwarmIntelligence Based Data Mining
Algorithms for Classiﬁcation
Ahmed Khademzadeh
Abstract Data mining is the process of ﬁnding hidden knowledge in large amounts
of data.Marketing,Fraud Detection,Telecommunication,and Data Cleaning are
some of few important applications of data mining.There are several methods for
data mining:classiﬁcation,regression,clustering,summarization,dependency,and
change and deviation detection.Classiﬁcation is task of classifying data items into
a set of predeﬁned classes.A classiﬁer is created by processing a training set of
labeled data items,and constructing a set of rules.After rule construction,the clas
siﬁer will be able to classify (label) new data items based on the rules it has learned
from the training data.Swarm Intelligence methods imitate a swarm of insects and
try to solve problems in a way similar to the way a swarmof insects are solving their
real problems (foraging,nest selection,etc.).Swarmbased methods have also been
used in data mining.In this article we investigated some classiﬁcation rule construc
tion methods which have used Ant Colony Optimization as their base method.We
will see that swarmbased methods can performas good as nonswarmmethods.
Key words:Data Mining,Classiﬁcation,Rule Induction,Particle Swarm Opti
mization,Ant Colony Optimization,SwarmIntelligence.
1 Introduction
Data mining,or knowledge mining from data,is the process of ﬁnding hidden
knowledge and patterns in large amounts of data.Among diverse applications
we can name Marketing [12],Fraud Detection [3],Telecommunication,and Data
Cleaning.There are several methods for data mining that we deﬁne brieﬂy each of
themhere [4] [5] [13]:
Ahmed Khademzadeh
Florida Institute of Technology,150 West Univ.Blvd,Melbourne,Florida email:
akhademzadeh2011@my.ﬁt.edu
1
2 Ahmed Khademzadeh
Classiﬁcation in which data items are classiﬁed to one among several predeﬁned
classes.
Regression by which a data itemis mapped to a numerical real value.Later these
values can be processed for different purposes.
Clustering is used when we dont have predeﬁned categories,and we want to put
data items that are more similar to each other in one cluster.The ultimate goal
is splitting data items up to several clusters in which each of themcontains most
similar data items.
Summarization in which an abstract and abridged description of data items is
generated.
Dependency Modeling in which structural or quantitative dependency models
among data items is researched.
Change and Deviation Detection is used to discover the substantial changes in
data items comparing to a previously seen data.
The algorithms investigated in this article are using swarm intelligence methods
for classiﬁcation purpose.In the following subsections we introduce the two main
components of this article that are classiﬁcation and swarmintelligence brieﬂy.
1.1 Classiﬁcation
As mentioned before,a classiﬁer tries to classify data items to different predeﬁned
classes.Rule based classiﬁers are using rules in order to classify the data items.
Each rule has two parts:antecedent and consequent.Below a sample rule is shown:
If (x > 1.2) and (y = blue)then class = a (1)
In this rule if part is called antecedent and then part is called consequent.An
tecedent usually consists one or several parts (called term) that are joined to each
other using conjunction operator.During classiﬁcation,if a data item satisﬁes the
antecedent of a rule will be classiﬁed based on the rules consequent.
The most important phase of classiﬁer construction is rule construction.In rule
construction phase,labeled data items of different classes of training data are ex
amined to ﬁnd patterns.The attributevalue pairs that are repeated in each class is
considered as pattern,and rules are constructed based on these patterns.Handling
categorical (nominal) attributes (those that consist enumerable number of values) is
easy,but handling contiguous attributes is sometimes challenging and nave methods
do not handle them.
During or after rule construction,rules may be pruned in order to increase their
quality.During pruning some terms may be dropped fromthe antecedent.After con
structing a rule set,the rules may be tested for quality.Rules with more coverage
(satisfying more data items in data set) are considered to be of more quality.This
way some rules may be dropped from the rule set,and the rules also are ordered
based on their quality.This way a sequential rule set is constructed.Algorithm 1
SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 3
shows an algorithmfor rule construction that prunes the rules during rule construc
tion.This algorithm splits data into two different parts.One part is used for rule
construction,and the other part is used for pruning [4] [5] [13].
Algorithm1.Algorithmfor forming rules by incremental reducederror pruning [13]
S p l i t E i n t o Grow and Pr une i n t he r a t i o 2:1
For each c l a s s C f or whi ch Grow and Pr une bot h
c ont a i n an i n s t a n c e
Use t he ba s i c cover i ng a l gor i t hm t o c r e a t e t he
be s t p e r f e c t r u l e f or c l a s s C
Ca l c ul a t e t he wor t h w(R) f or t he r u l e on Prune,
and f or t he r u l e wi t h t he f i n a l c ondi t i on
omi t t ed w( R)
Whi l e w( R) > w(R),remove t he f i n a l c ondi t i on
from t he r u l e and r e pe a t t he pr e vi ous s t e p
From t he r u l e s gener at ed,s e l e c t t he one wi t h
t he l a r g e s t w(R)
Pr i n t t he r u l e
Remove t he i n s t a n c e s cover ed by t he r u l e from E
Cont i nue
1.2 SwarmIntelligence
Bees,ants,wasps,are the insects that sometimes do big tasks although they are not
of high intelligence.For example ants are constructing Minimum Spanning Tree,
Steiner Tree when they are doing internest activities [2] [6].Ants also ﬁnd the
shortest path between their nest and the foraging site [2].Figure 1 illustrates this
behavior.In ﬁgure 1 (a) ants are traveling between nest and food source in a direct
line.For experiment purpose their path is disturbed by an object.Ants then try to
bypass the object by going to right and left for ﬁnding a new path.As it can be seen
in ﬁgure 1 (b) at ﬁrst half of the ants are going to right and half to the left,but since
the right way is shorter than left one,the rate of going and coming back in right way
is higher and so the amount of pheromone (chemical substance which ants deposit
on the ground when they are passing a path),on the right way will be move over time
and ants are indirectly communicating with each other and tell each other (using the
amount of pheromone) that the right way is better,and as it is shown in ﬁgure 1 (c),
right path is selected and the left one is ignored by ants.
As another example bees are able to select the best nest among several nest when
they decide to immigrate to a new nest [2] [11].There are many more examples
of this kind which shows that swarm of insects are able to perform selforganized
task while they have small amount of memories and intelligence.The notions of
4 Ahmed Khademzadeh
selfautonomy,low memory,low intelligence,distributedness,and yet emergence
of complicated outcomes have encouraged the scientist to imitate swarm behaviors
in design and implementation of real complicated systems [2] [6] [11].
Many research have been done on swarmbased data mining.For example Shi
Zhongzhi and Wu Bin [15] have proposed a clustering algorithm based on swarm
intelligence.Abraham,A.et al.[1] has explored the role of swarm intelligence in
clustering and proposed a technique for clustering data items into an optimized num
ber of groups.L Xiao et al.[14] has proposed a swarm based anomaly intrusion
detection algorithm.
Fig.1 Illustrating the behavior of real ant movements [1]
In this article we explore some of the researches that have been done on swarm
based classiﬁcation.Since efﬁciency of a set of classiﬁcation rules is important and
there are many possibilities for having different sets of rules for a singe data set,this
problemcould be considered as an optimization problem.Ant Colony Optimization
(ACO) and Particle Swarm Optimization (PSO) are two swarm inspired techniques
for optimization that are applied to construction of classiﬁcation rules.
In the following sections we investigate different swarminspired algorithms that
are proposed for classiﬁcation rule construction.In section 2 we discuss Ant Colony
Optimization and illustrate it using Traveling Salesman Problem (TSP) example.
One of the ACO based classiﬁcation rule construction algorithms (AntMiner) is
discussed in section 3.In section 4 and 5 AntMiner2 and AntMiner3 which are
improvement over AntMiner are discussed.Section 6 concludes the paper.
2 Ant Colony Optimization (ACO) for Classiﬁcation
This is a simple yet a powerful algorithmwhich can be used for optimization prob
lems.We explain ACO using a simple graph related example in the following.
SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 5
Suppose we want to solve the Traveling Salesman Problem (TSP) [7] with the
help of ants.We put some ants in the graph and ask them to ﬁnd the best TSP tour.
Each ant is trying to ﬁnd a Hamiltonian cycle of the graph.They start froma random
node and in each step they select one of the neighbors of the current node that they
have not yet visited.At ﬁrst,when no pheromone is yet deposited on edges,they
have a tendency to select the neighbor that is closer to them.After ﬁnishing each
round,each ant deposit some pheromone on the edges in their TSP tour,and the best
tour among all is selected and edges of that tour will receive more pheromone.This
way shorter edges,and edges that have been part of at least one TSP tour of one ant
will receive some pheromones.Ants are now again ﬁnd new TSP tours.This time
each ant selects a neighbor probabilistically based on both distance and amount of
pheromone that is deposited on the edge.An edge with higher amount of pheromone
has a higher chance of being selected by ants [2] [10].This way although none of
the ants have a global view of the graph and they just have locally and without any
sophisticated algorithm have selected the next edge,the emergent outcome will be
a result that is very close to best solution.
3 AntMiner
In this section we study the AntMiner algorithm[10] which is an ACO based clas
siﬁcation algorithm.Rafael S.Parpinelli et al have propose this algorithm.
As we saw in previous example in Ant Colony Optimization each ant is con
structing a solution to the target problem.In classiﬁcation rule context the problem
is ﬁnding a new good rule.Here each rule consists of one or several terms that each
is in the formof <attribute,operator,value>.For example <y,=,blue>.
If our attributes are categorical (nominal),then the operator will always be =,
but if we have continuous attributes,we need ,,<,and > operators too.Some
methods discretize the continuous values in order to treat them as categorical at
tributes.Construction of a rule set can be ﬁnished when we reach to a point which
all,or almost all data items are covered by rule set.Algorithm2 shows a highlevel
description of AntMiner algorithm.In this algorithm rule generation is continuing
while the number of remained data items in training set is higher than a threshold.
Each ant starts with an empty rule and adds terms to the rules.The terms are selected
based on the logic similar to one we observed in TSP example above.Amount of
pheromone and a heuristic function (similar to nearest neighbor in TSP example
which we will explain later for this context),are the bases for selecting a term.Term
selection is done based on the following probability:
P
i j
=
hi j:ti j(t)
a
i=1
x
i
:
b
i
j=1
(h
i j
:t
i j
(t))
(2)
In above equation
6 Ahmed Khademzadeh
a is the number of attributes,
x
i
is one if attribute A
i
is not chosen by current ant,and zero if it is already
selected,
b
i
is the total number of different values for attribute A
i
in all data items,
t
i j
(t) is the amount of pheromone of termij,and if this value is higher,the prob
ability of selection of this termwill be higher,
h
i j
is deﬁned based on a heuristic function which is problemdependent.
Algorithm2.A HighLevel Description of AntMiner [10]
Di s c over e dRul eLi s t = [ ];
/ r u l e l i s t i s i n i t i a l i z e d wi t h an e mp t y l i s t /
WHILE ( Tr a i ni ngSe t > Max
uncover ed
cases )
t = 1;/ ant i ndex /
j = 1;/ conver gence t e s t i ndex /
I n i t i a l i z e a l l t r a i l s wi t h t he same amount of
pheromone;
REPEAT
Ant t s t a r t s wi t h an empty r u l e and i nc r e me nt a l l y
c o n s t r u c t s a c l a s s i f i c a t i o n r u l e Rt by addi ng
one t er m a t a t i me t o t he c u r r e n t r u l e;
Pr une r u l e Rt;
Updat e t he pheromone of a l l t r a i l s by i n c r e a s i n g
pheromone i n t he t r a i l f ol l owed by Ant t
( p r o p o r t i o n a l t o t he q u a l i t y of R ) and
de c r e a s i ng pheromone i n t he ot he r t r a i l s
( s i mul a t i ng pheromone e va por a t i on );
IF ( Rt i s equal t o Rt 1)/ updat e conver gence
t e s t /
THEN j = j + 1;
ELSE j = 1;
END IF
t = t + 1;
UNTIL ( t >= No
of
ant s ) OR ( j >= No
r ul es
conver g )
Choose t he be s t r u l e Rbest among a l l r u l e s Rt
c ons t r uc t e d by a l l t he a nt s;
Add r u l e Rbest t o Di s c over edRul eLi s t;
Tr a i ni ngSe t = Tr a i ni ngSe t f s e t of c a s e s c o r r e c t l y
cover ed by Rbest g;
ENDWHILE
As we saw in TSP example,h
i j
was deﬁned based on the distance of the node
with its neighbors,and thus the nearer neighbor had a higher chance of getting
selected.Below this heuristic function (it is called informationtheoretic heuristic
function) is deﬁned for classiﬁcation problem:
SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 7
h
i j
=
log
2
k H(WjA
i
=V
i j
)
a
i=1
x
i
:
b
i
j=1
(log
2
k H(WjA
i
=V
i j
))
(3)
The value of Hin above equation is deﬁned based on the following equation that
is intuitively related to entropy:
H(WjA
i
=V
i j
) =
k
w=1
(P(wjA
i
=V
i j
):log
2
P(wjA
i
=V
i j
)) (4)
In above equations k is number of classes,W is class attribute,and P(wjA
i
=V
i j
)
is a conditional probability of having class w given that A
i
=V
i j
Table 1 Prediction Accuracy of AntMiner and CN2 After the Tenfold CrossValidation Procedure
[10]
AntMiner’s CN2’s
Dataset
Predictive Predictive
Accuracy (%) Accuracy (%)
Ljubljana breast cancer
75:282:24 67:693:59
Wisconsin breast cancer
96:040:93 94:880:88
Tictactoe
73:042:53 97:380:52
Dermatology
94:291:20 90:381:66
Hepatitis
90:003:11 90:002:50
Cleveland heart disease
59:672:50 57:481:78
After constructing a rule,each ant prunes the rule,and removes some terms in
antecedent to make the rule more predictive and prohibit overﬁttingness of the rule.
The process of rule pruning is similar to the method that is used in Algorithm 1.
The only difference here is that we may remove each of the terms in the rule;while
in Algorithm 1 the terms are removed from the end the rule.The quality of a rule
before and after pruning is evaluated using the following metrics:
True Positive:Number of data items correctly classiﬁed by the rule to the class
predicted by the rule.
True Negative:Number of data items correctly not classiﬁed by the rule to the
class predicted by the rule.
False Positive:Number of data items wrongly classiﬁed by the rule to the class
in the rule consequent.
False Negative:Number of data items wrongly not classiﬁed by the rule to the
class in the rule consequent.
The above metrics are used to compose the following composed metric:
8 Ahmed Khademzadeh
Q=
TruePos
TruePos +FalseNeg
TrueNeg
FalsePos +TrueNeg
(5)
Before starting algorithm all the edges have been updated with the following
amount of pheromone:
ti j(t =0) =
1
a
i=1
b
i
(6)
Now,pheromone update is done by all ants,and each ants updates all the edges
correspond to terms in the rule according to the following equation:
ti j(t +1) =t
i j
(t) +t
i j
(t) Q;8term
i j
2Rule (7)
Next,amount of pheromones on all edges are normalized to reduce the amount
of pheromone on the edges that are not used in the rule.
As the last step in each iteration the best rule among all rules created by all ants
is selected,and is added to the rule set.
The AntMiner algorithm uses the following values for its parameters that are
selected based on the authors experiments:
Number of ants:3000,
Minimumnumber of cases per rule:10,
Maximumnumber of uncovered cases in the training set:10,
Number of rules used to test convergence of the ants:10.
Several experiments have been done on AntMiner and it has been shown [10] that
using pheromone has a signiﬁcant effect on the accuracy of our classiﬁer.It also has
been shown that rule pruning is also important.Table 1 compares the accuracy of
AntMiner and CN2 (a nonswarmclassiﬁcation algorithm).
Table 2 Accuracy Rate and Simplicity of AntMiner and AntMiner2 [9]
Valuation item
AntMiner AntMiner2
Accuracy Rate
91% 91%
Number of Rules
10:51:4 10:51:4
4 AntMiner2 (A densitybased AntMiner)
Bo Liu et al.[9] have proposed some improvements on AntMiner.They made some
changes in heuristic value which is calculated in original AntMiner based on en
tropy function.They simpliﬁed the heuristic function and argued that this function
SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 9
need not to be very precise because the pheromone trails on edges will compensate
if our heuristic value is not very precise.They proposed the following densitybased
heuristic function in which majorityClassT
i j
is the majority class in partition T
i j
:
h
i j
=
majorityClassT
i j
jT
i j
j
(8)
They have conducted an experiment in which they compared AntMiner and Ant
Miner using one database (Wisconsin Breast Cancer),and concluded that the accu
racy and number of rules generated in both algorithms are identical.The result of
their experiment is shown in Table 2.
Based on their result they concluded that the heuristic function doesn’t need to
be very precise and ACO is tolerant to the simplicity of the heuristic function.
Table 3 Test Set Accuracy Rate [8]
Breast Cancer
Tic Tac Toe
Run Number
AntMiner1 AntMiner3
AntMiner1 AntMiner3
1
92.05 94.32
71.28 82.97
2
93.15 93.15
73.40 72.34
3
91.67 91.67
67.37 78.94
4
95.59 97.06
71.58 80.00
5
88.41 92.75
68.42 72.63
6
94.20 95.65
75.79 80.00
7
90.77 93.84
74.74 81.05
8
96.55 96.55
65.26 74.74
9
91.04 92.54
73.68 75.79
10
92.86 95.71
68.42 67.37
5 AntMiner3
AntMiner and AntMiner2 is improved by Bo Liu et.al [8].In AntMiner3 they
changed the pheromone update mechanismand also provided a new state transition
rule,and showed that this will increase the accuracy of the algorithm.
Pheromone update:The pheromone update equation in original AntMiner is
replaced by:
t
i j
(t) =(1r) t
i j
(t 1) +(1
1
1+Q
) t
i j
(t 1) (9)
In above equation r is the rate of pheromone evaporation,and Q is quality of
the constructed rule which is calculated based on equation 5.As it can be inferred
from second operand of the + operation in the above equation,if the quality of a
10 Ahmed Khademzadeh
rule is high (close to 1),half of the amount of pheromone in the previous time will
be added in this step,and if the quality is low (close to 0) the amount of pheromone
will not be added because of the rule quality.The authors of AntMiner3 considered
the value of 0:1 for the evaporation rate in their experiments.
After updating the amount of pheromone based on the above equation,normal
ization of the pheromone values will normalize and cover the pheromone evapora
tion.
State Transition:Depositing pheromone on edges is a method which can be
considered both as distributed longterm memory and as communication mecha
nism.This method lets ants to be more biased toward edges which has more amount
of pheromone,and thus exploitation.The more the amount of pheromone on edges,
the less the possibility of exploration.Liu et al.propose another state transition rule
in order to enhance the role of exploration.This state transition rule is as following:
if q
1
f then
loop
if q
2
j2J
i
P
i j
then
Choose term
i j
end if
end loop
else
Choose term
i j
with max P
i j
end if
In above pseudocode:
q
1
and q
2
are randomnumber,
phi is a parameter fromthe range of 0 and 1,
J
i
is all values of i
th
attribute,and
P
i j
is the probability which is calculated based on equation 2.
In this state transition method f is an adjustable parameter by which we can tune
the amount of exploration vs.exploitation.If the randomnumber q
1
is greater than f
then the system is exploiting the previously found knowledge,while when it is less
than f the systemis exploring.They used the value of 0.4 for f in their experiments.
Liu et al.used Wisconsin Breast Cancer and Tic Tac Toe Endgame dataset for
their experiments.Atenfold crossvalidation is used to compare original AntMiner
and AntMiner3.The accuracy of both algorithm is calculated based on equation 5,
and the result of all ten runs is depicted in table 3.Table 4 show the mean accuracy
and mean number of rules for original AntMiner and AntMiner3.As it can be seen
in table 3,the mean accuracy of the AntMiner3 is higher than original AntMiner.
6 Conclusion
In this article we studied 3 different swarmbased classiﬁcation rule construction al
gorithms.Original AntMiner which is based on Ant Colony Optimization looks at
SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 11
Table 4 Mean accuracy rate and mean number of rule lists [8]
Breast Cancer
Tic Tac Toe
Valuation item
AntMiner1 AntMiner3
AntMiner1 AntMiner3
Accuracy Rate (%)
92.63 94.32
70.99 76.58
Number of Rules
10.1 13.2
16.5 18.58
different data item ﬁelds as paths in ACO.Every data item could be selected by an
ant as a termand can be added to a rule.Several rules are constructed by several ants
and this task is repeated until our rule set is constructed.During rule construction
phase ants deposit pheromone on the path and this help them to ﬁnd better rules.
AntMiner2 improves the original AntMiner by deﬁning a new heuristic function.
AntMiner2 showed that using a very precise heuristic function in ACObased clas
siﬁcation rule construction algorithms might not be of signiﬁcant importance.Ant
Miner3 also suggests some improvements over original AntMiner.It uses a new
pheromone update equation,and also provides a new state transition mechanism.
The experiments in AntMiner,AntMiner2,and AntMiner3 suggests that swarm
based classiﬁcation algorithms are performing as good as nonswarmalgorithms.
References
1.Abraham,A.,Das,S.,Roy,S.:Swarmintelligence algorithms for data clustering.Soft Com
puting for Knowledge Discovery and Data Mining pp.279–313 (2008)
2.Bonabeau,E.,Dorigo,M.,Theraulaz,G.:Swarm intelligence:from natural to artiﬁcial sys
tems.1.Oxford University Press,USA(1999)
3.Chan,P.,Fan,W.,Prodromidis,A.,Stolfo,S.:Distributed data mining in credit card fraud
detection.Intelligent Systems and their Applications,IEEE 14(6),67–74 (1999)
4.Fayyad,U.,PiatetskyShapiro,G.,Smyth,P.,Uthurusamy,R.:Advances in knowledge dis
covery and data mining (1996)
5.Kantardzic,M.:Data mining:concepts,models,methods,and algorithms.WileyIEEE Press
(2011)
6.Latty,T.,Ramsch,K.,Ito,K.,Nakagaki,T.,Sumpter,D.,Middendorf,M.,Beekman,M.:
Structure and formation of ant transportation networks.Journal of The Royal Society Interface
8(62),1298–1306 (2011)
7.Leiserson,C.,Rivest,R.,Stein,C.:Introduction to algorithms,vol.6.MIT press (2001)
8.Liu,B.,Abbas,H.,McKay,B.:Classiﬁcation rule discovery with ant colony optimization.In:
Intelligent Agent Technology,2003.IAT 2003.IEEE/WIC International Conference on,pp.
83 – 88 (2003).DOI 10.1109/IAT.2003.1241052
9.Liu,B.,Abbass,H.,McKay,B.:Densitybased heuristic for rule discovery with antminer.
In:The 6th AustraliaJapan joint workshop on intelligent and evolutionary system,vol.184.
Citeseer (2002)
10.Parpinelli,R.,Lopes,H.,Freitas,A.:Data mining with an ant colony optimization algorithm.
Evolutionary Computation,IEEE Transactions on 6(4),321–332 (2002)
11.Seeley,T.,Buhrman,S.:Nestsite selection in honey bees:how well do swarms implement
the” bestofn” decision rule?Behavioral Ecology and Sociobiology 49(5),416–427 (2001)
12.Shaw,M.,Subramaniam,C.,Tan,G.,Welge,M.:Knowledge management and data mining
for marketing.Decision Support Systems 31(1),127–137 (2001)
12 Ahmed Khademzadeh
13.Witten,I.,Frank,E.:Data Mining:Practical machine learning tools and techniques.Morgan
Kaufmann (2005)
14.Xiao,L.,Shao,Z.,Liu,G.:Kmeans algorithm based on particle swarm optimization algo
rithmfor anomaly intrusion detection.In:Intelligent Control and Automation,2006.WCICA
2006.The Sixth World Congress on,vol.2,pp.5854–5858.IEEE (2006)
15.Zhongzhi,S.,et al.:A clustering algorithm based on swarm intelligence.In:Infotech and
Infonet,2001.Proceedings.ICII 2001Beijing.2001 International Conferences on,vol.3,
pp.58–66.IEEE (2001)
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment