SwarmIntelligence Based Data Mining

Algorithms for Classiﬁcation

Ahmed Khademzadeh

Abstract Data mining is the process of ﬁnding hidden knowledge in large amounts

of data.Marketing,Fraud Detection,Telecommunication,and Data Cleaning are

some of few important applications of data mining.There are several methods for

data mining:classiﬁcation,regression,clustering,summarization,dependency,and

change and deviation detection.Classiﬁcation is task of classifying data items into

a set of pre-deﬁned classes.A classiﬁer is created by processing a training set of

labeled data items,and constructing a set of rules.After rule construction,the clas-

siﬁer will be able to classify (label) new data items based on the rules it has learned

from the training data.Swarm Intelligence methods imitate a swarm of insects and

try to solve problems in a way similar to the way a swarmof insects are solving their

real problems (foraging,nest selection,etc.).Swarmbased methods have also been

used in data mining.In this article we investigated some classiﬁcation rule construc-

tion methods which have used Ant Colony Optimization as their base method.We

will see that swarmbased methods can performas good as non-swarmmethods.

Key words:Data Mining,Classiﬁcation,Rule Induction,Particle Swarm Opti-

mization,Ant Colony Optimization,SwarmIntelligence.

1 Introduction

Data mining,or knowledge mining from data,is the process of ﬁnding hidden

knowledge and patterns in large amounts of data.Among diverse applications

we can name Marketing [12],Fraud Detection [3],Telecommunication,and Data

Cleaning.There are several methods for data mining that we deﬁne brieﬂy each of

themhere [4] [5] [13]:

Ahmed Khademzadeh

Florida Institute of Technology,150 West Univ.Blvd,Melbourne,Florida e-mail:

akhademzadeh2011@my.ﬁt.edu

1

2 Ahmed Khademzadeh

Classiﬁcation in which data items are classiﬁed to one among several predeﬁned

classes.

Regression by which a data itemis mapped to a numerical real value.Later these

values can be processed for different purposes.

Clustering is used when we dont have predeﬁned categories,and we want to put

data items that are more similar to each other in one cluster.The ultimate goal

is splitting data items up to several clusters in which each of themcontains most

similar data items.

Summarization in which an abstract and abridged description of data items is

generated.

Dependency Modeling in which structural or quantitative dependency models

among data items is researched.

Change and Deviation Detection is used to discover the substantial changes in

data items comparing to a previously seen data.

The algorithms investigated in this article are using swarm intelligence methods

for classiﬁcation purpose.In the following subsections we introduce the two main

components of this article that are classiﬁcation and swarmintelligence brieﬂy.

1.1 Classiﬁcation

As mentioned before,a classiﬁer tries to classify data items to different predeﬁned

classes.Rule based classiﬁers are using rules in order to classify the data items.

Each rule has two parts:antecedent and consequent.Below a sample rule is shown:

If (x > 1.2) and (y = blue)then class = a (1)

In this rule if part is called antecedent and then part is called consequent.An-

tecedent usually consists one or several parts (called term) that are joined to each

other using conjunction operator.During classiﬁcation,if a data item satisﬁes the

antecedent of a rule will be classiﬁed based on the rules consequent.

The most important phase of classiﬁer construction is rule construction.In rule

construction phase,labeled data items of different classes of training data are ex-

amined to ﬁnd patterns.The attribute-value pairs that are repeated in each class is

considered as pattern,and rules are constructed based on these patterns.Handling

categorical (nominal) attributes (those that consist enumerable number of values) is

easy,but handling contiguous attributes is sometimes challenging and nave methods

do not handle them.

During or after rule construction,rules may be pruned in order to increase their

quality.During pruning some terms may be dropped fromthe antecedent.After con-

structing a rule set,the rules may be tested for quality.Rules with more coverage

(satisfying more data items in data set) are considered to be of more quality.This

way some rules may be dropped from the rule set,and the rules also are ordered

based on their quality.This way a sequential rule set is constructed.Algorithm 1

SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 3

shows an algorithmfor rule construction that prunes the rules during rule construc-

tion.This algorithm splits data into two different parts.One part is used for rule

construction,and the other part is used for pruning [4] [5] [13].

Algorithm1.Algorithmfor forming rules by incremental reduced-error pruning [13]

S p l i t E i n t o Grow and Pr une i n t he r a t i o 2:1

For each c l a s s C f or whi ch Grow and Pr une bot h

c ont a i n an i n s t a n c e

Use t he ba s i c cover i ng a l gor i t hm t o c r e a t e t he

be s t p e r f e c t r u l e f or c l a s s C

Ca l c ul a t e t he wor t h w(R) f or t he r u l e on Prune,

and f or t he r u l e wi t h t he f i n a l c ondi t i on

omi t t ed w( R)

Whi l e w( R) > w(R),remove t he f i n a l c ondi t i on

from t he r u l e and r e pe a t t he pr e vi ous s t e p

From t he r u l e s gener at ed,s e l e c t t he one wi t h

t he l a r g e s t w(R)

Pr i n t t he r u l e

Remove t he i n s t a n c e s cover ed by t he r u l e from E

Cont i nue

1.2 SwarmIntelligence

Bees,ants,wasps,are the insects that sometimes do big tasks although they are not

of high intelligence.For example ants are constructing Minimum Spanning Tree,

Steiner Tree when they are doing inter-nest activities [2] [6].Ants also ﬁnd the

shortest path between their nest and the foraging site [2].Figure 1 illustrates this

behavior.In ﬁgure 1 (a) ants are traveling between nest and food source in a direct

line.For experiment purpose their path is disturbed by an object.Ants then try to

bypass the object by going to right and left for ﬁnding a new path.As it can be seen

in ﬁgure 1 (b) at ﬁrst half of the ants are going to right and half to the left,but since

the right way is shorter than left one,the rate of going and coming back in right way

is higher and so the amount of pheromone (chemical substance which ants deposit

on the ground when they are passing a path),on the right way will be move over time

and ants are indirectly communicating with each other and tell each other (using the

amount of pheromone) that the right way is better,and as it is shown in ﬁgure 1 (c),

right path is selected and the left one is ignored by ants.

As another example bees are able to select the best nest among several nest when

they decide to immigrate to a new nest [2] [11].There are many more examples

of this kind which shows that swarm of insects are able to perform self-organized

task while they have small amount of memories and intelligence.The notions of

4 Ahmed Khademzadeh

self-autonomy,low memory,low intelligence,distributed-ness,and yet emergence

of complicated outcomes have encouraged the scientist to imitate swarm behaviors

in design and implementation of real complicated systems [2] [6] [11].

Many research have been done on swarm-based data mining.For example Shi

Zhongzhi and Wu Bin [15] have proposed a clustering algorithm based on swarm

intelligence.Abraham,A.et al.[1] has explored the role of swarm intelligence in

clustering and proposed a technique for clustering data items into an optimized num-

ber of groups.L Xiao et al.[14] has proposed a swarm based anomaly intrusion

detection algorithm.

Fig.1 Illustrating the behavior of real ant movements [1]

In this article we explore some of the researches that have been done on swarm

based classiﬁcation.Since efﬁciency of a set of classiﬁcation rules is important and

there are many possibilities for having different sets of rules for a singe data set,this

problemcould be considered as an optimization problem.Ant Colony Optimization

(ACO) and Particle Swarm Optimization (PSO) are two swarm inspired techniques

for optimization that are applied to construction of classiﬁcation rules.

In the following sections we investigate different swarminspired algorithms that

are proposed for classiﬁcation rule construction.In section 2 we discuss Ant Colony

Optimization and illustrate it using Traveling Salesman Problem (TSP) example.

One of the ACO based classiﬁcation rule construction algorithms (Ant-Miner) is

discussed in section 3.In section 4 and 5 Ant-Miner2 and Ant-Miner3 which are

improvement over Ant-Miner are discussed.Section 6 concludes the paper.

2 Ant Colony Optimization (ACO) for Classiﬁcation

This is a simple yet a powerful algorithmwhich can be used for optimization prob-

lems.We explain ACO using a simple graph related example in the following.

SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 5

Suppose we want to solve the Traveling Salesman Problem (TSP) [7] with the

help of ants.We put some ants in the graph and ask them to ﬁnd the best TSP tour.

Each ant is trying to ﬁnd a Hamiltonian cycle of the graph.They start froma random

node and in each step they select one of the neighbors of the current node that they

have not yet visited.At ﬁrst,when no pheromone is yet deposited on edges,they

have a tendency to select the neighbor that is closer to them.After ﬁnishing each

round,each ant deposit some pheromone on the edges in their TSP tour,and the best

tour among all is selected and edges of that tour will receive more pheromone.This

way shorter edges,and edges that have been part of at least one TSP tour of one ant

will receive some pheromones.Ants are now again ﬁnd new TSP tours.This time

each ant selects a neighbor probabilistically based on both distance and amount of

pheromone that is deposited on the edge.An edge with higher amount of pheromone

has a higher chance of being selected by ants [2] [10].This way although none of

the ants have a global view of the graph and they just have locally and without any

sophisticated algorithm have selected the next edge,the emergent outcome will be

a result that is very close to best solution.

3 Ant-Miner

In this section we study the Ant-Miner algorithm[10] which is an ACO based clas-

siﬁcation algorithm.Rafael S.Parpinelli et al have propose this algorithm.

As we saw in previous example in Ant Colony Optimization each ant is con-

structing a solution to the target problem.In classiﬁcation rule context the problem

is ﬁnding a new good rule.Here each rule consists of one or several terms that each

is in the formof <attribute,operator,value>.For example <y,=,blue>.

If our attributes are categorical (nominal),then the operator will always be =,

but if we have continuous attributes,we need ,,<,and > operators too.Some

methods discretize the continuous values in order to treat them as categorical at-

tributes.Construction of a rule set can be ﬁnished when we reach to a point which

all,or almost all data items are covered by rule set.Algorithm2 shows a high-level

description of Ant-Miner algorithm.In this algorithm rule generation is continuing

while the number of remained data items in training set is higher than a threshold.

Each ant starts with an empty rule and adds terms to the rules.The terms are selected

based on the logic similar to one we observed in TSP example above.Amount of

pheromone and a heuristic function (similar to nearest neighbor in TSP example

which we will explain later for this context),are the bases for selecting a term.Term

selection is done based on the following probability:

P

i j

=

hi j:ti j(t)

a

i=1

x

i

:

b

i

j=1

(h

i j

:t

i j

(t))

(2)

In above equation

6 Ahmed Khademzadeh

a is the number of attributes,

x

i

is one if attribute A

i

is not chosen by current ant,and zero if it is already

selected,

b

i

is the total number of different values for attribute A

i

in all data items,

t

i j

(t) is the amount of pheromone of termij,and if this value is higher,the prob-

ability of selection of this termwill be higher,

h

i j

is deﬁned based on a heuristic function which is problemdependent.

Algorithm2.A High-Level Description of Ant-Miner [10]

Di s c over e dRul eLi s t = [ ];

/ r u l e l i s t i s i n i t i a l i z e d wi t h an e mp t y l i s t /

WHILE ( Tr a i ni ngSe t > Max

uncover ed

cases )

t = 1;/ ant i ndex /

j = 1;/ conver gence t e s t i ndex /

I n i t i a l i z e a l l t r a i l s wi t h t he same amount of

pheromone;

REPEAT

Ant t s t a r t s wi t h an empty r u l e and i nc r e me nt a l l y

c o n s t r u c t s a c l a s s i f i c a t i o n r u l e Rt by addi ng

one t er m a t a t i me t o t he c u r r e n t r u l e;

Pr une r u l e Rt;

Updat e t he pheromone of a l l t r a i l s by i n c r e a s i n g

pheromone i n t he t r a i l f ol l owed by Ant t

( p r o p o r t i o n a l t o t he q u a l i t y of R ) and

de c r e a s i ng pheromone i n t he ot he r t r a i l s

( s i mul a t i ng pheromone e va por a t i on );

IF ( Rt i s equal t o Rt 1)/ updat e conver gence

t e s t /

THEN j = j + 1;

ELSE j = 1;

END IF

t = t + 1;

UNTIL ( t >= No

of

ant s ) OR ( j >= No

r ul es

conver g )

Choose t he be s t r u l e Rbest among a l l r u l e s Rt

c ons t r uc t e d by a l l t he a nt s;

Add r u l e Rbest t o Di s c over edRul eLi s t;

Tr a i ni ngSe t = Tr a i ni ngSe t f s e t of c a s e s c o r r e c t l y

cover ed by Rbest g;

ENDWHILE

As we saw in TSP example,h

i j

was deﬁned based on the distance of the node

with its neighbors,and thus the nearer neighbor had a higher chance of getting

selected.Below this heuristic function (it is called information-theoretic heuristic

function) is deﬁned for classiﬁcation problem:

SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 7

h

i j

=

log

2

k H(WjA

i

=V

i j

)

a

i=1

x

i

:

b

i

j=1

(log

2

k H(WjA

i

=V

i j

))

(3)

The value of Hin above equation is deﬁned based on the following equation that

is intuitively related to entropy:

H(WjA

i

=V

i j

) =

k

w=1

(P(wjA

i

=V

i j

):log

2

P(wjA

i

=V

i j

)) (4)

In above equations k is number of classes,W is class attribute,and P(wjA

i

=V

i j

)

is a conditional probability of having class w given that A

i

=V

i j

Table 1 Prediction Accuracy of Ant-Miner and CN2 After the Tenfold Cross-Validation Procedure

[10]

Ant-Miner’s CN2’s

Dataset

Predictive Predictive

Accuracy (%) Accuracy (%)

Ljubljana breast cancer

75:282:24 67:693:59

Wisconsin breast cancer

96:040:93 94:880:88

Tic-tac-toe

73:042:53 97:380:52

Dermatology

94:291:20 90:381:66

Hepatitis

90:003:11 90:002:50

Cleveland heart disease

59:672:50 57:481:78

After constructing a rule,each ant prunes the rule,and removes some terms in

antecedent to make the rule more predictive and prohibit over-ﬁttingness of the rule.

The process of rule pruning is similar to the method that is used in Algorithm 1.

The only difference here is that we may remove each of the terms in the rule;while

in Algorithm 1 the terms are removed from the end the rule.The quality of a rule

before and after pruning is evaluated using the following metrics:

True Positive:Number of data items correctly classiﬁed by the rule to the class

predicted by the rule.

True Negative:Number of data items correctly not classiﬁed by the rule to the

class predicted by the rule.

False Positive:Number of data items wrongly classiﬁed by the rule to the class

in the rule consequent.

False Negative:Number of data items wrongly not classiﬁed by the rule to the

class in the rule consequent.

The above metrics are used to compose the following composed metric:

8 Ahmed Khademzadeh

Q=

TruePos

TruePos +FalseNeg

TrueNeg

FalsePos +TrueNeg

(5)

Before starting algorithm all the edges have been updated with the following

amount of pheromone:

ti j(t =0) =

1

a

i=1

b

i

(6)

Now,pheromone update is done by all ants,and each ants updates all the edges

correspond to terms in the rule according to the following equation:

ti j(t +1) =t

i j

(t) +t

i j

(t) Q;8term

i j

2Rule (7)

Next,amount of pheromones on all edges are normalized to reduce the amount

of pheromone on the edges that are not used in the rule.

As the last step in each iteration the best rule among all rules created by all ants

is selected,and is added to the rule set.

The Ant-Miner algorithm uses the following values for its parameters that are

selected based on the authors experiments:

Number of ants:3000,

Minimumnumber of cases per rule:10,

Maximumnumber of uncovered cases in the training set:10,

Number of rules used to test convergence of the ants:10.

Several experiments have been done on Ant-Miner and it has been shown [10] that

using pheromone has a signiﬁcant effect on the accuracy of our classiﬁer.It also has

been shown that rule pruning is also important.Table 1 compares the accuracy of

Ant-Miner and CN2 (a non-swarmclassiﬁcation algorithm).

Table 2 Accuracy Rate and Simplicity of Ant-Miner and Ant-Miner2 [9]

Valuation item

Ant-Miner Ant-Miner2

Accuracy Rate

91% 91%

Number of Rules

10:51:4 10:51:4

4 Ant-Miner2 (A density-based Ant-Miner)

Bo Liu et al.[9] have proposed some improvements on Ant-Miner.They made some

changes in heuristic value which is calculated in original Ant-Miner based on en-

tropy function.They simpliﬁed the heuristic function and argued that this function

SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 9

need not to be very precise because the pheromone trails on edges will compensate

if our heuristic value is not very precise.They proposed the following density-based

heuristic function in which majorityClassT

i j

is the majority class in partition T

i j

:

h

i j

=

majorityClassT

i j

jT

i j

j

(8)

They have conducted an experiment in which they compared Ant-Miner and Ant-

Miner using one database (Wisconsin Breast Cancer),and concluded that the accu-

racy and number of rules generated in both algorithms are identical.The result of

their experiment is shown in Table 2.

Based on their result they concluded that the heuristic function doesn’t need to

be very precise and ACO is tolerant to the simplicity of the heuristic function.

Table 3 Test Set Accuracy Rate [8]

Breast Cancer

Tic Tac Toe

Run Number

AntMiner1 AntMiner3

AntMiner1 AntMiner3

1

92.05 94.32

71.28 82.97

2

93.15 93.15

73.40 72.34

3

91.67 91.67

67.37 78.94

4

95.59 97.06

71.58 80.00

5

88.41 92.75

68.42 72.63

6

94.20 95.65

75.79 80.00

7

90.77 93.84

74.74 81.05

8

96.55 96.55

65.26 74.74

9

91.04 92.54

73.68 75.79

10

92.86 95.71

68.42 67.37

5 Ant-Miner3

Ant-Miner and Ant-Miner2 is improved by Bo Liu et.al [8].In Ant-Miner3 they

changed the pheromone update mechanismand also provided a new state transition

rule,and showed that this will increase the accuracy of the algorithm.

Pheromone update:The pheromone update equation in original Ant-Miner is

replaced by:

t

i j

(t) =(1r) t

i j

(t 1) +(1

1

1+Q

) t

i j

(t 1) (9)

In above equation r is the rate of pheromone evaporation,and Q is quality of

the constructed rule which is calculated based on equation 5.As it can be inferred

from second operand of the + operation in the above equation,if the quality of a

10 Ahmed Khademzadeh

rule is high (close to 1),half of the amount of pheromone in the previous time will

be added in this step,and if the quality is low (close to 0) the amount of pheromone

will not be added because of the rule quality.The authors of Ant-Miner3 considered

the value of 0:1 for the evaporation rate in their experiments.

After updating the amount of pheromone based on the above equation,normal-

ization of the pheromone values will normalize and cover the pheromone evapora-

tion.

State Transition:Depositing pheromone on edges is a method which can be

considered both as distributed long-term memory and as communication mecha-

nism.This method lets ants to be more biased toward edges which has more amount

of pheromone,and thus exploitation.The more the amount of pheromone on edges,

the less the possibility of exploration.Liu et al.propose another state transition rule

in order to enhance the role of exploration.This state transition rule is as following:

if q

1

f then

loop

if q

2

j2J

i

P

i j

then

Choose term

i j

end if

end loop

else

Choose term

i j

with max P

i j

end if

In above pseudocode:

q

1

and q

2

are randomnumber,

phi is a parameter fromthe range of 0 and 1,

J

i

is all values of i

th

attribute,and

P

i j

is the probability which is calculated based on equation 2.

In this state transition method f is an adjustable parameter by which we can tune

the amount of exploration vs.exploitation.If the randomnumber q

1

is greater than f

then the system is exploiting the previously found knowledge,while when it is less

than f the systemis exploring.They used the value of 0.4 for f in their experiments.

Liu et al.used Wisconsin Breast Cancer and Tic Tac Toe Endgame dataset for

their experiments.Aten-fold cross-validation is used to compare original AntMiner

and AntMiner3.The accuracy of both algorithm is calculated based on equation 5,

and the result of all ten runs is depicted in table 3.Table 4 show the mean accuracy

and mean number of rules for original AntMiner and AntMiner3.As it can be seen

in table 3,the mean accuracy of the AntMiner3 is higher than original AntMiner.

6 Conclusion

In this article we studied 3 different swarm-based classiﬁcation rule construction al-

gorithms.Original Ant-Miner which is based on Ant Colony Optimization looks at

SwarmIntelligence Based Data Mining Algorithms for Classiﬁcation 11

Table 4 Mean accuracy rate and mean number of rule lists [8]

Breast Cancer

Tic Tac Toe

Valuation item

AntMiner1 AntMiner3

AntMiner1 AntMiner3

Accuracy Rate (%)

92.63 94.32

70.99 76.58

Number of Rules

10.1 13.2

16.5 18.58

different data item ﬁelds as paths in ACO.Every data item could be selected by an

ant as a termand can be added to a rule.Several rules are constructed by several ants

and this task is repeated until our rule set is constructed.During rule construction

phase ants deposit pheromone on the path and this help them to ﬁnd better rules.

Ant-Miner2 improves the original Ant-Miner by deﬁning a new heuristic function.

Ant-Miner2 showed that using a very precise heuristic function in ACO-based clas-

siﬁcation rule construction algorithms might not be of signiﬁcant importance.Ant-

Miner3 also suggests some improvements over original Ant-Miner.It uses a new

pheromone update equation,and also provides a new state transition mechanism.

The experiments in Ant-Miner,Ant-Miner2,and Ant-Miner3 suggests that swarm-

based classiﬁcation algorithms are performing as good as non-swarmalgorithms.

References

1.Abraham,A.,Das,S.,Roy,S.:Swarmintelligence algorithms for data clustering.Soft Com-

puting for Knowledge Discovery and Data Mining pp.279–313 (2008)

2.Bonabeau,E.,Dorigo,M.,Theraulaz,G.:Swarm intelligence:from natural to artiﬁcial sys-

tems.1.Oxford University Press,USA(1999)

3.Chan,P.,Fan,W.,Prodromidis,A.,Stolfo,S.:Distributed data mining in credit card fraud

detection.Intelligent Systems and their Applications,IEEE 14(6),67–74 (1999)

4.Fayyad,U.,Piatetsky-Shapiro,G.,Smyth,P.,Uthurusamy,R.:Advances in knowledge dis-

covery and data mining (1996)

5.Kantardzic,M.:Data mining:concepts,models,methods,and algorithms.Wiley-IEEE Press

(2011)

6.Latty,T.,Ramsch,K.,Ito,K.,Nakagaki,T.,Sumpter,D.,Middendorf,M.,Beekman,M.:

Structure and formation of ant transportation networks.Journal of The Royal Society Interface

8(62),1298–1306 (2011)

7.Leiserson,C.,Rivest,R.,Stein,C.:Introduction to algorithms,vol.6.MIT press (2001)

8.Liu,B.,Abbas,H.,McKay,B.:Classiﬁcation rule discovery with ant colony optimization.In:

Intelligent Agent Technology,2003.IAT 2003.IEEE/WIC International Conference on,pp.

83 – 88 (2003).DOI 10.1109/IAT.2003.1241052

9.Liu,B.,Abbass,H.,McKay,B.:Density-based heuristic for rule discovery with ant-miner.

In:The 6th Australia-Japan joint workshop on intelligent and evolutionary system,vol.184.

Citeseer (2002)

10.Parpinelli,R.,Lopes,H.,Freitas,A.:Data mining with an ant colony optimization algorithm.

Evolutionary Computation,IEEE Transactions on 6(4),321–332 (2002)

11.Seeley,T.,Buhrman,S.:Nest-site selection in honey bees:how well do swarms implement

the” best-of-n” decision rule?Behavioral Ecology and Sociobiology 49(5),416–427 (2001)

12.Shaw,M.,Subramaniam,C.,Tan,G.,Welge,M.:Knowledge management and data mining

for marketing.Decision Support Systems 31(1),127–137 (2001)

12 Ahmed Khademzadeh

13.Witten,I.,Frank,E.:Data Mining:Practical machine learning tools and techniques.Morgan

Kaufmann (2005)

14.Xiao,L.,Shao,Z.,Liu,G.:K-means algorithm based on particle swarm optimization algo-

rithmfor anomaly intrusion detection.In:Intelligent Control and Automation,2006.WCICA

2006.The Sixth World Congress on,vol.2,pp.5854–5858.IEEE (2006)

15.Zhongzhi,S.,et al.:A clustering algorithm based on swarm intelligence.In:Info-tech and

Info-net,2001.Proceedings.ICII 2001-Beijing.2001 International Conferences on,vol.3,

pp.58–66.IEEE (2001)

## Σχόλια 0

Συνδεθείτε για να κοινοποιήσετε σχόλιο