Swarm Intelligence Based Data Mining Algorithms for Classification

aroocarmineAI and Robotics

Oct 29, 2013 (3 years and 5 months ago)

102 views

SwarmIntelligence Based Data Mining
Algorithms for Classification
Ahmed Khademzadeh
Abstract Data mining is the process of finding hidden knowledge in large amounts
of data.Marketing,Fraud Detection,Telecommunication,and Data Cleaning are
some of few important applications of data mining.There are several methods for
data mining:classification,regression,clustering,summarization,dependency,and
change and deviation detection.Classification is task of classifying data items into
a set of pre-defined classes.A classifier is created by processing a training set of
labeled data items,and constructing a set of rules.After rule construction,the clas-
sifier will be able to classify (label) new data items based on the rules it has learned
from the training data.Swarm Intelligence methods imitate a swarm of insects and
try to solve problems in a way similar to the way a swarmof insects are solving their
real problems (foraging,nest selection,etc.).Swarmbased methods have also been
used in data mining.In this article we investigated some classification rule construc-
tion methods which have used Ant Colony Optimization as their base method.We
will see that swarmbased methods can performas good as non-swarmmethods.
Key words:Data Mining,Classification,Rule Induction,Particle Swarm Opti-
mization,Ant Colony Optimization,SwarmIntelligence.
1 Introduction
Data mining,or knowledge mining from data,is the process of finding hidden
knowledge and patterns in large amounts of data.Among diverse applications
we can name Marketing [12],Fraud Detection [3],Telecommunication,and Data
Cleaning.There are several methods for data mining that we define briefly each of
themhere [4] [5] [13]:
Ahmed Khademzadeh
Florida Institute of Technology,150 West Univ.Blvd,Melbourne,Florida e-mail:
akhademzadeh2011@my.fit.edu
1
2 Ahmed Khademzadeh
 Classification in which data items are classified to one among several predefined
classes.
 Regression by which a data itemis mapped to a numerical real value.Later these
values can be processed for different purposes.
 Clustering is used when we dont have predefined categories,and we want to put
data items that are more similar to each other in one cluster.The ultimate goal
is splitting data items up to several clusters in which each of themcontains most
similar data items.
 Summarization in which an abstract and abridged description of data items is
generated.
 Dependency Modeling in which structural or quantitative dependency models
among data items is researched.
 Change and Deviation Detection is used to discover the substantial changes in
data items comparing to a previously seen data.
The algorithms investigated in this article are using swarm intelligence methods
for classification purpose.In the following subsections we introduce the two main
components of this article that are classification and swarmintelligence briefly.
1.1 Classification
As mentioned before,a classifier tries to classify data items to different predefined
classes.Rule based classifiers are using rules in order to classify the data items.
Each rule has two parts:antecedent and consequent.Below a sample rule is shown:
If (x > 1.2) and (y = blue)then class = a (1)
In this rule if part is called antecedent and then part is called consequent.An-
tecedent usually consists one or several parts (called term) that are joined to each
other using conjunction operator.During classification,if a data item satisfies the
antecedent of a rule will be classified based on the rules consequent.
The most important phase of classifier construction is rule construction.In rule
construction phase,labeled data items of different classes of training data are ex-
amined to find patterns.The attribute-value pairs that are repeated in each class is
considered as pattern,and rules are constructed based on these patterns.Handling
categorical (nominal) attributes (those that consist enumerable number of values) is
easy,but handling contiguous attributes is sometimes challenging and nave methods
do not handle them.
During or after rule construction,rules may be pruned in order to increase their
quality.During pruning some terms may be dropped fromthe antecedent.After con-
structing a rule set,the rules may be tested for quality.Rules with more coverage
(satisfying more data items in data set) are considered to be of more quality.This
way some rules may be dropped from the rule set,and the rules also are ordered
based on their quality.This way a sequential rule set is constructed.Algorithm 1
SwarmIntelligence Based Data Mining Algorithms for Classification 3
shows an algorithmfor rule construction that prunes the rules during rule construc-
tion.This algorithm splits data into two different parts.One part is used for rule
construction,and the other part is used for pruning [4] [5] [13].
Algorithm1.Algorithmfor forming rules by incremental reduced-error pruning [13]
S p l i t E i n t o Grow and Pr une i n t he r a t i o 2:1
For each c l a s s C f or whi ch Grow and Pr une bot h
c ont a i n an i n s t a n c e
Use t he ba s i c cover i ng a l gor i t hm t o c r e a t e t he
be s t p e r f e c t r u l e f or c l a s s C
Ca l c ul a t e t he wor t h w(R) f or t he r u l e on Prune,
and f or t he r u l e wi t h t he f i n a l c ondi t i on
omi t t ed w( R)
Whi l e w( R) > w(R),remove t he f i n a l c ondi t i on
from t he r u l e and r e pe a t t he pr e vi ous s t e p
From t he r u l e s gener at ed,s e l e c t t he one wi t h
t he l a r g e s t w(R)
Pr i n t t he r u l e
Remove t he i n s t a n c e s cover ed by t he r u l e from E
Cont i nue
1.2 SwarmIntelligence
Bees,ants,wasps,are the insects that sometimes do big tasks although they are not
of high intelligence.For example ants are constructing Minimum Spanning Tree,
Steiner Tree when they are doing inter-nest activities [2] [6].Ants also find the
shortest path between their nest and the foraging site [2].Figure 1 illustrates this
behavior.In figure 1 (a) ants are traveling between nest and food source in a direct
line.For experiment purpose their path is disturbed by an object.Ants then try to
bypass the object by going to right and left for finding a new path.As it can be seen
in figure 1 (b) at first half of the ants are going to right and half to the left,but since
the right way is shorter than left one,the rate of going and coming back in right way
is higher and so the amount of pheromone (chemical substance which ants deposit
on the ground when they are passing a path),on the right way will be move over time
and ants are indirectly communicating with each other and tell each other (using the
amount of pheromone) that the right way is better,and as it is shown in figure 1 (c),
right path is selected and the left one is ignored by ants.
As another example bees are able to select the best nest among several nest when
they decide to immigrate to a new nest [2] [11].There are many more examples
of this kind which shows that swarm of insects are able to perform self-organized
task while they have small amount of memories and intelligence.The notions of
4 Ahmed Khademzadeh
self-autonomy,low memory,low intelligence,distributed-ness,and yet emergence
of complicated outcomes have encouraged the scientist to imitate swarm behaviors
in design and implementation of real complicated systems [2] [6] [11].
Many research have been done on swarm-based data mining.For example Shi
Zhongzhi and Wu Bin [15] have proposed a clustering algorithm based on swarm
intelligence.Abraham,A.et al.[1] has explored the role of swarm intelligence in
clustering and proposed a technique for clustering data items into an optimized num-
ber of groups.L Xiao et al.[14] has proposed a swarm based anomaly intrusion
detection algorithm.
Fig.1 Illustrating the behavior of real ant movements [1]
In this article we explore some of the researches that have been done on swarm
based classification.Since efficiency of a set of classification rules is important and
there are many possibilities for having different sets of rules for a singe data set,this
problemcould be considered as an optimization problem.Ant Colony Optimization
(ACO) and Particle Swarm Optimization (PSO) are two swarm inspired techniques
for optimization that are applied to construction of classification rules.
In the following sections we investigate different swarminspired algorithms that
are proposed for classification rule construction.In section 2 we discuss Ant Colony
Optimization and illustrate it using Traveling Salesman Problem (TSP) example.
One of the ACO based classification rule construction algorithms (Ant-Miner) is
discussed in section 3.In section 4 and 5 Ant-Miner2 and Ant-Miner3 which are
improvement over Ant-Miner are discussed.Section 6 concludes the paper.
2 Ant Colony Optimization (ACO) for Classification
This is a simple yet a powerful algorithmwhich can be used for optimization prob-
lems.We explain ACO using a simple graph related example in the following.
SwarmIntelligence Based Data Mining Algorithms for Classification 5
Suppose we want to solve the Traveling Salesman Problem (TSP) [7] with the
help of ants.We put some ants in the graph and ask them to find the best TSP tour.
Each ant is trying to find a Hamiltonian cycle of the graph.They start froma random
node and in each step they select one of the neighbors of the current node that they
have not yet visited.At first,when no pheromone is yet deposited on edges,they
have a tendency to select the neighbor that is closer to them.After finishing each
round,each ant deposit some pheromone on the edges in their TSP tour,and the best
tour among all is selected and edges of that tour will receive more pheromone.This
way shorter edges,and edges that have been part of at least one TSP tour of one ant
will receive some pheromones.Ants are now again find new TSP tours.This time
each ant selects a neighbor probabilistically based on both distance and amount of
pheromone that is deposited on the edge.An edge with higher amount of pheromone
has a higher chance of being selected by ants [2] [10].This way although none of
the ants have a global view of the graph and they just have locally and without any
sophisticated algorithm have selected the next edge,the emergent outcome will be
a result that is very close to best solution.
3 Ant-Miner
In this section we study the Ant-Miner algorithm[10] which is an ACO based clas-
sification algorithm.Rafael S.Parpinelli et al have propose this algorithm.
As we saw in previous example in Ant Colony Optimization each ant is con-
structing a solution to the target problem.In classification rule context the problem
is finding a new good rule.Here each rule consists of one or several terms that each
is in the formof <attribute,operator,value>.For example <y,=,blue>.
If our attributes are categorical (nominal),then the operator will always be =,
but if we have continuous attributes,we need ,,<,and > operators too.Some
methods discretize the continuous values in order to treat them as categorical at-
tributes.Construction of a rule set can be finished when we reach to a point which
all,or almost all data items are covered by rule set.Algorithm2 shows a high-level
description of Ant-Miner algorithm.In this algorithm rule generation is continuing
while the number of remained data items in training set is higher than a threshold.
Each ant starts with an empty rule and adds terms to the rules.The terms are selected
based on the logic similar to one we observed in TSP example above.Amount of
pheromone and a heuristic function (similar to nearest neighbor in TSP example
which we will explain later for this context),are the bases for selecting a term.Term
selection is done based on the following probability:
P
i j
=
hi j:ti j(t)

a
i=1
x
i
:

b
i
j=1
(h
i j
:t
i j
(t))
(2)
In above equation
6 Ahmed Khademzadeh
 a is the number of attributes,
 x
i
is one if attribute A
i
is not chosen by current ant,and zero if it is already
selected,
 b
i
is the total number of different values for attribute A
i
in all data items,
 t
i j
(t) is the amount of pheromone of termij,and if this value is higher,the prob-
ability of selection of this termwill be higher,
 h
i j
is defined based on a heuristic function which is problemdependent.
Algorithm2.A High-Level Description of Ant-Miner [10]
Di s c over e dRul eLi s t = [ ];
/ r u l e l i s t i s i n i t i a l i z e d wi t h an e mp t y l i s t /
WHILE ( Tr a i ni ngSe t > Max
uncover ed
cases )
t = 1;/ ant i ndex /
j = 1;/ conver gence t e s t i ndex /
I n i t i a l i z e a l l t r a i l s wi t h t he same amount of
pheromone;
REPEAT
Ant t s t a r t s wi t h an empty r u l e and i nc r e me nt a l l y
c o n s t r u c t s a c l a s s i f i c a t i o n r u l e Rt by addi ng
one t er m a t a t i me t o t he c u r r e n t r u l e;
Pr une r u l e Rt;
Updat e t he pheromone of a l l t r a i l s by i n c r e a s i n g
pheromone i n t he t r a i l f ol l owed by Ant t
( p r o p o r t i o n a l t o t he q u a l i t y of R ) and
de c r e a s i ng pheromone i n t he ot he r t r a i l s
( s i mul a t i ng pheromone e va por a t i on );
IF ( Rt i s equal t o Rt 1)/ updat e conver gence
t e s t /
THEN j = j + 1;
ELSE j = 1;
END IF
t = t + 1;
UNTIL ( t >= No
of
ant s ) OR ( j >= No
r ul es
conver g )
Choose t he be s t r u l e Rbest among a l l r u l e s Rt
c ons t r uc t e d by a l l t he a nt s;
Add r u l e Rbest t o Di s c over edRul eLi s t;
Tr a i ni ngSe t = Tr a i ni ngSe t  f s e t of c a s e s c o r r e c t l y
cover ed by Rbest g;
ENDWHILE
As we saw in TSP example,h
i j
was defined based on the distance of the node
with its neighbors,and thus the nearer neighbor had a higher chance of getting
selected.Below this heuristic function (it is called information-theoretic heuristic
function) is defined for classification problem:
SwarmIntelligence Based Data Mining Algorithms for Classification 7
h
i j
=
log
2
k H(WjA
i
=V
i j
)

a
i=1
x
i
:

b
i
j=1
(log
2
k H(WjA
i
=V
i j
))
(3)
The value of Hin above equation is defined based on the following equation that
is intuitively related to entropy:
H(WjA
i
=V
i j
) =
k

w=1
(P(wjA
i
=V
i j
):log
2
P(wjA
i
=V
i j
)) (4)
In above equations k is number of classes,W is class attribute,and P(wjA
i
=V
i j
)
is a conditional probability of having class w given that A
i
=V
i j
Table 1 Prediction Accuracy of Ant-Miner and CN2 After the Tenfold Cross-Validation Procedure
[10]
Ant-Miner’s CN2’s
Dataset
Predictive Predictive
Accuracy (%) Accuracy (%)
Ljubljana breast cancer
75:282:24 67:693:59
Wisconsin breast cancer
96:040:93 94:880:88
Tic-tac-toe
73:042:53 97:380:52
Dermatology
94:291:20 90:381:66
Hepatitis
90:003:11 90:002:50
Cleveland heart disease
59:672:50 57:481:78
After constructing a rule,each ant prunes the rule,and removes some terms in
antecedent to make the rule more predictive and prohibit over-fittingness of the rule.
The process of rule pruning is similar to the method that is used in Algorithm 1.
The only difference here is that we may remove each of the terms in the rule;while
in Algorithm 1 the terms are removed from the end the rule.The quality of a rule
before and after pruning is evaluated using the following metrics:
 True Positive:Number of data items correctly classified by the rule to the class
predicted by the rule.
 True Negative:Number of data items correctly not classified by the rule to the
class predicted by the rule.
 False Positive:Number of data items wrongly classified by the rule to the class
in the rule consequent.
 False Negative:Number of data items wrongly not classified by the rule to the
class in the rule consequent.
The above metrics are used to compose the following composed metric:
8 Ahmed Khademzadeh
Q=
TruePos
TruePos +FalseNeg

TrueNeg
FalsePos +TrueNeg
(5)
Before starting algorithm all the edges have been updated with the following
amount of pheromone:
ti j(t =0) =
1

a
i=1
b
i
(6)
Now,pheromone update is done by all ants,and each ants updates all the edges
correspond to terms in the rule according to the following equation:
ti j(t +1) =t
i j
(t) +t
i j
(t) Q;8term
i j
2Rule (7)
Next,amount of pheromones on all edges are normalized to reduce the amount
of pheromone on the edges that are not used in the rule.
As the last step in each iteration the best rule among all rules created by all ants
is selected,and is added to the rule set.
The Ant-Miner algorithm uses the following values for its parameters that are
selected based on the authors experiments:
 Number of ants:3000,
 Minimumnumber of cases per rule:10,
 Maximumnumber of uncovered cases in the training set:10,
 Number of rules used to test convergence of the ants:10.
Several experiments have been done on Ant-Miner and it has been shown [10] that
using pheromone has a significant effect on the accuracy of our classifier.It also has
been shown that rule pruning is also important.Table 1 compares the accuracy of
Ant-Miner and CN2 (a non-swarmclassification algorithm).
Table 2 Accuracy Rate and Simplicity of Ant-Miner and Ant-Miner2 [9]
Valuation item
Ant-Miner Ant-Miner2
Accuracy Rate
91% 91%
Number of Rules
10:51:4 10:51:4
4 Ant-Miner2 (A density-based Ant-Miner)
Bo Liu et al.[9] have proposed some improvements on Ant-Miner.They made some
changes in heuristic value which is calculated in original Ant-Miner based on en-
tropy function.They simplified the heuristic function and argued that this function
SwarmIntelligence Based Data Mining Algorithms for Classification 9
need not to be very precise because the pheromone trails on edges will compensate
if our heuristic value is not very precise.They proposed the following density-based
heuristic function in which majorityClassT
i j
is the majority class in partition T
i j
:
h
i j
=
majorityClassT
i j
jT
i j
j
(8)
They have conducted an experiment in which they compared Ant-Miner and Ant-
Miner using one database (Wisconsin Breast Cancer),and concluded that the accu-
racy and number of rules generated in both algorithms are identical.The result of
their experiment is shown in Table 2.
Based on their result they concluded that the heuristic function doesn’t need to
be very precise and ACO is tolerant to the simplicity of the heuristic function.
Table 3 Test Set Accuracy Rate [8]
Breast Cancer
Tic Tac Toe
Run Number
AntMiner1 AntMiner3
AntMiner1 AntMiner3
1
92.05 94.32
71.28 82.97
2
93.15 93.15
73.40 72.34
3
91.67 91.67
67.37 78.94
4
95.59 97.06
71.58 80.00
5
88.41 92.75
68.42 72.63
6
94.20 95.65
75.79 80.00
7
90.77 93.84
74.74 81.05
8
96.55 96.55
65.26 74.74
9
91.04 92.54
73.68 75.79
10
92.86 95.71
68.42 67.37
5 Ant-Miner3
Ant-Miner and Ant-Miner2 is improved by Bo Liu et.al [8].In Ant-Miner3 they
changed the pheromone update mechanismand also provided a new state transition
rule,and showed that this will increase the accuracy of the algorithm.
Pheromone update:The pheromone update equation in original Ant-Miner is
replaced by:
t
i j
(t) =(1r) t
i j
(t 1) +(1
1
1+Q
) t
i j
(t 1) (9)
In above equation r is the rate of pheromone evaporation,and Q is quality of
the constructed rule which is calculated based on equation 5.As it can be inferred
from second operand of the + operation in the above equation,if the quality of a
10 Ahmed Khademzadeh
rule is high (close to 1),half of the amount of pheromone in the previous time will
be added in this step,and if the quality is low (close to 0) the amount of pheromone
will not be added because of the rule quality.The authors of Ant-Miner3 considered
the value of 0:1 for the evaporation rate in their experiments.
After updating the amount of pheromone based on the above equation,normal-
ization of the pheromone values will normalize and cover the pheromone evapora-
tion.
State Transition:Depositing pheromone on edges is a method which can be
considered both as distributed long-term memory and as communication mecha-
nism.This method lets ants to be more biased toward edges which has more amount
of pheromone,and thus exploitation.The more the amount of pheromone on edges,
the less the possibility of exploration.Liu et al.propose another state transition rule
in order to enhance the role of exploration.This state transition rule is as following:
if q
1
f then
loop
if q
2


j2J
i
P
i j
then
Choose term
i j
end if
end loop
else
Choose term
i j
with max P
i j
end if
In above pseudocode:
 q
1
and q
2
are randomnumber,
 phi is a parameter fromthe range of 0 and 1,
 J
i
is all values of i
th
attribute,and
 P
i j
is the probability which is calculated based on equation 2.
In this state transition method f is an adjustable parameter by which we can tune
the amount of exploration vs.exploitation.If the randomnumber q
1
is greater than f
then the system is exploiting the previously found knowledge,while when it is less
than f the systemis exploring.They used the value of 0.4 for f in their experiments.
Liu et al.used Wisconsin Breast Cancer and Tic Tac Toe Endgame dataset for
their experiments.Aten-fold cross-validation is used to compare original AntMiner
and AntMiner3.The accuracy of both algorithm is calculated based on equation 5,
and the result of all ten runs is depicted in table 3.Table 4 show the mean accuracy
and mean number of rules for original AntMiner and AntMiner3.As it can be seen
in table 3,the mean accuracy of the AntMiner3 is higher than original AntMiner.
6 Conclusion
In this article we studied 3 different swarm-based classification rule construction al-
gorithms.Original Ant-Miner which is based on Ant Colony Optimization looks at
SwarmIntelligence Based Data Mining Algorithms for Classification 11
Table 4 Mean accuracy rate and mean number of rule lists [8]
Breast Cancer
Tic Tac Toe
Valuation item
AntMiner1 AntMiner3
AntMiner1 AntMiner3
Accuracy Rate (%)
92.63 94.32
70.99 76.58
Number of Rules
10.1 13.2
16.5 18.58
different data item fields as paths in ACO.Every data item could be selected by an
ant as a termand can be added to a rule.Several rules are constructed by several ants
and this task is repeated until our rule set is constructed.During rule construction
phase ants deposit pheromone on the path and this help them to find better rules.
Ant-Miner2 improves the original Ant-Miner by defining a new heuristic function.
Ant-Miner2 showed that using a very precise heuristic function in ACO-based clas-
sification rule construction algorithms might not be of significant importance.Ant-
Miner3 also suggests some improvements over original Ant-Miner.It uses a new
pheromone update equation,and also provides a new state transition mechanism.
The experiments in Ant-Miner,Ant-Miner2,and Ant-Miner3 suggests that swarm-
based classification algorithms are performing as good as non-swarmalgorithms.
References
1.Abraham,A.,Das,S.,Roy,S.:Swarmintelligence algorithms for data clustering.Soft Com-
puting for Knowledge Discovery and Data Mining pp.279–313 (2008)
2.Bonabeau,E.,Dorigo,M.,Theraulaz,G.:Swarm intelligence:from natural to artificial sys-
tems.1.Oxford University Press,USA(1999)
3.Chan,P.,Fan,W.,Prodromidis,A.,Stolfo,S.:Distributed data mining in credit card fraud
detection.Intelligent Systems and their Applications,IEEE 14(6),67–74 (1999)
4.Fayyad,U.,Piatetsky-Shapiro,G.,Smyth,P.,Uthurusamy,R.:Advances in knowledge dis-
covery and data mining (1996)
5.Kantardzic,M.:Data mining:concepts,models,methods,and algorithms.Wiley-IEEE Press
(2011)
6.Latty,T.,Ramsch,K.,Ito,K.,Nakagaki,T.,Sumpter,D.,Middendorf,M.,Beekman,M.:
Structure and formation of ant transportation networks.Journal of The Royal Society Interface
8(62),1298–1306 (2011)
7.Leiserson,C.,Rivest,R.,Stein,C.:Introduction to algorithms,vol.6.MIT press (2001)
8.Liu,B.,Abbas,H.,McKay,B.:Classification rule discovery with ant colony optimization.In:
Intelligent Agent Technology,2003.IAT 2003.IEEE/WIC International Conference on,pp.
83 – 88 (2003).DOI 10.1109/IAT.2003.1241052
9.Liu,B.,Abbass,H.,McKay,B.:Density-based heuristic for rule discovery with ant-miner.
In:The 6th Australia-Japan joint workshop on intelligent and evolutionary system,vol.184.
Citeseer (2002)
10.Parpinelli,R.,Lopes,H.,Freitas,A.:Data mining with an ant colony optimization algorithm.
Evolutionary Computation,IEEE Transactions on 6(4),321–332 (2002)
11.Seeley,T.,Buhrman,S.:Nest-site selection in honey bees:how well do swarms implement
the” best-of-n” decision rule?Behavioral Ecology and Sociobiology 49(5),416–427 (2001)
12.Shaw,M.,Subramaniam,C.,Tan,G.,Welge,M.:Knowledge management and data mining
for marketing.Decision Support Systems 31(1),127–137 (2001)
12 Ahmed Khademzadeh
13.Witten,I.,Frank,E.:Data Mining:Practical machine learning tools and techniques.Morgan
Kaufmann (2005)
14.Xiao,L.,Shao,Z.,Liu,G.:K-means algorithm based on particle swarm optimization algo-
rithmfor anomaly intrusion detection.In:Intelligent Control and Automation,2006.WCICA
2006.The Sixth World Congress on,vol.2,pp.5854–5858.IEEE (2006)
15.Zhongzhi,S.,et al.:A clustering algorithm based on swarm intelligence.In:Info-tech and
Info-net,2001.Proceedings.ICII 2001-Beijing.2001 International Conferences on,vol.3,
pp.58–66.IEEE (2001)