International Journal of Computer Science and Application Issue 2010
ISSN 09740767
76
A New Ant Approach for Unr aveling Data
Clustering and DataClassification Setback
Mohd. Husain, Raj Gaurang Tiwarim,Anil Agrawal, Bineet Gupta
AbstractData mining is a process that uses technology to
bridge the gap between data and logical decision making.The
ter minology itself provides a promising view of systematic
data manipulation for extr acting useful information and
knowledge from high volume of data.Numerous techniques
are developed tofulfill this goal.This paper describes the data
mining ter minology,outlines the colony optimization
algorithm which is used newly in data mining mostly aiming
solve dataclustering and dataclassification problems and
developed from imitating the technique of real ants finding
the shor test way from their nests and the food source.This
paper represents an application aiming to cluster a data set
with ant colony optimization algor ithm and to increase the
working performance of colony optimization algorithm used
for solving datacluster ing problem,proposes two new
techniques and shows the increase on the perfor mance with
the addition of these suggested techniques.
KeywordsDataMining,KnowledgeDiscoveryinDatabases,
Clustering,ColonyOptimization.
I. INTRODUCTION
Data Mining (DM) or Knowledge discovery in Databases
(KDD) as it is also known,is the nontrivial extraction of
implicit,previously unknown,and potentially useful
information from data [1]. This encompasses a number of
different technical approaches,such as clustering,data
summarization,learning classification rules,finding
dependency networks,analyzing changes,and detecting
anomalies.Clustering is the task of identifying groups in a
data set based upon some criteria of similarity [2].
Clustering aims to discover sensible organization of
objects in a given dataset by identifying and quantifying
similarities or dissimilarities between the objects[3].In
data mining,clustering is used especially as preprocess to
another data mining application.We implement a
clustering method using ant colony optimization for
clustering a data set into a predetermined number of
clusters and propose two new techniques added to the
algorithm.
Real ants have the abilitytofindthe shortest pathfromtheir
nests to the food source without any visual trace [4].Ant
colony optimization is developed by modeling this
behavior of real ants[2].
This paper is organized as follows:Section 2 describes ant
colony optimization subject.Section 3 describes ant
colony optimization algorithm developed for data
clustering,and our proposed two newtechniques.Section 4
reports the results of the verification of ACOalgorithmand
proposed techniques on an application program using a
dataset.Finally,in Section 5 conclusions of the current work
are reported.
II.ANT COLONY OPTIMIZATION
Ant colonyoptimization(ACO) [5] mimics the wayreal ants
find the shortest route between a food source and their nest.
As shown in Figure 1a,ants start from their nest and goes
alonga linear paththroughthe foodsource.
Actually,if there exists a difficulty on the path while going
to the food source (Figure 1b),ant lying in front of this
difficulty can not continue and has to account a preference
for the new outgoing path.In the present case,selection
probability of the new direction alternatives of ants is
equal.In other words, if ant can select anyone of the right
and left directions,the selection chance of these directions
is equal (Figure 1c).Namely,two ants start fromtheir nest
in the search of food source at the same time to these two
directions.One of themchooses the paththat turns out tobe
shorter while the other takes the longer path.But it is
Figure1. Behavior of ants between their nest and food source
77
observed that following ants mostly select the shorter path
because of the pheromone concentration deposited mostly
onthe shorter one.
The ant movinginthe shorter pathreturns tothe nest earlier
andthe pheromone depositedinthis pathis obviouslymore
than what is deposited in the longer path.Other ants in the
nest thus have high probability of following the shorter
route.These ants also deposit their own pheromone on this
path.More andmore ants are soonattractedtothis pathand
hence the optimal route fromthe nest tothe foodsource and
back is very quickly established.Such a pheromone
meditated cooperative search process leads to the
intelligent swarmbehavior.
The instrument of ants uses to find the shortest path is
pheromone.Pheromone is a chemical secretion used by
some animals to affect their own species.Ant deposit some
pheromone while moving,they deposit some amount of
pheromone and they prefer the way deposited more
pheromone than the other one with a method based on
probability.Ants leave the pheromone on the selected path
while going to the food source,so they help following ants
onthe selectionof the path(Figure 1d).
III. CLUSTERING WITHANT COLONY
OPTIMIZATION
In this section we used ant colony optimization algorithm
tosolve the dataclusteringproblemandproposedtwonew
techniques are explained in detail and the solutions are
compared.
We use anACOalgorithmfor data clustering,inwhicha set
of concurrent distributed agents collectively discover a
sensible organization of objects for a given dataset [3]. In
the algorithm,each agent discovers a possible partition of
objects in a given dataset and the level of partitioning is
measured subject to some metric like Euclidean distance.
Information associated with an agent about clustering of
objects is accumulated in the global information hub
(pheromone trail matrix) and is used by the other agents to
construct possible clustering solutions and iteratively
improve them.The algorithmworks for a given maximum
number of iterations and the best solution found with
respect to a given metric represents an optimal or near
optimal partitioning of objects into subsets in a given
dataset.
The aimof dataclustering is to obtain optimal assignment
of Nobjects in one of the Kclusters where Nis the number
of objects and Kis the number of clusters[78].Artificial
ants used in algorithmare named as software ants or agent
and number of agents expressed with R.Ants start with
emptysolutionstrings andinthe first iterationthe elements
of the pheromone matrix are initialized to the same values.
With the progress of iterations,the pheromone matrix is
updateddependinguponthe qualityof solutions produced.
To describe the algorithm in detail,a data set with 10 test
data is formed.The data of this test data set are obtained
from UCI's machine learning repository [6]. Test data are
shown inTable 1 and in real data set,data are divided into 3
subsets,soK=3.
TABLE 1.ILLUSTRATIVE DATASET TO EXPLAIN ACO
ALGORITHM FOR CLUSTERING WITH N=10 AND N=4 (N:
NUMBEROFAGENTS,N:NUMBEROFATTRIBUTES)
To construct a solution,the agent uses the pheromone trail
information to allocate each element of string S to an
appropriate cluster label. At the start of the algorithm, each
agent or software ant start with empty solution string and
the pheromone matrix τ keeping each element is assigned
to which cluster is initialized to some small value τ0.
Hence,at first iteration each element of solution string Sof
eachagent is assignedrandomlytoone of the Kclusters.
The trail value, τij at location (i,j) represents the
pheromone concentration of sample i associated to the
cluster j. So, for the problemof separating Nsamples into
K clusters the size pheromone matrix is NxK. Thus, each
sample is associated with K pheromone concentrations.
The pheromone trail matrix evolves as we iterate.At any
iteration level,each agent or software ants will develop
solutions showing the probability of each ant belonging to
which cluster using this pheromone matrix.After
generating the solutions of R agents,a local search is
performed to further improve fitness of these solutions.
The pheromone matrix is then updated depending on the
quality of solutions produces by the agents.Then,the
agents build improved solutions depending on the
pheromone matrix and the above steps are repeated for
certainnumber of iterations.
At the end of any iteration level each agent generates the
solution using the information derived from updated
pheromone matrix.The pheromone matrix at any iteration
level for test dataset is showninTable 2below.
Sample
Number
Sepal
length
Sepal
width
Petal
length
Petal
width
Cluster
1 5.1 3.5 1.4 0.2 1
2 7 3.2 4.7 1.4 2
3 6.3 3.3 6 2.5 3
4 4.9 3 1.4 0.2 1
5 4.6 3.1 1.5 0.2 1
6 6.4 3.2 4.5 1.5 2
7 6.2 2.9 4.3 1.3 2
8 5.8 2.7 5.1 1.9 3
9 7.1 3 5.9 2.1 3
10 6.3 2.9 5.6 1.8 3
International Journal of Computer Science and Application Issue 2010
ISSN 09740767
78
TABLE 3. FOR DATACLUSTERING PROBLEM GENERATED
SOLUTIONS SORTED DECREASINGLY
The pheromone concentration for the first sample as shown
in Table 2 are: τ11=(0,014756), τ12=(0,015274) and
τ13=(0,009900). It indicates that at the current iteration,
sample number 1 has the highest probability of belonging
to cluster number 2, because τ12is the highest.
Each agent selects a cluster number with a probability
value for each element of S string to formits own solution
string S.The quality of constructed solution string S is
measured in terms of the value of objective function for a
given dataclustering problem.This objective function is
defined as the sumof squared Euclidian distances between
each object and the center of belonging cluster. Then, the
elements of the population,namely agents are sorted
increasingly by the objective function values.Because,
the lower objective function value,the higher fitness to the
real solution,namely,lower objective function values are
more approximated to real solution values. Table 4 shows
the solution string values of ten agents in the test data set
andthe fitness values of eachagent sorteddecreasingly.
Most of existing ant colony optimization algorithms use
some local search procedures to develop the generated
solutions discovered by software ants. Local search helps
togenerate better solutions,if the heuristic informationcan
not be discovered easily.Local search is applied on all
generated solutions or on a few percent R.In this work,
local search is performed on 20%of the total solutions.So
in the test data set of 10 data,local search is applied on the
top 2 solutions inTable 3.In the local search procedure,the
objective function values of top 2 agents are computed
again. These solutions can be accepted only if there is an
improvement onthe fitness,namely,if the newlycomputed
objective function value is lower than the first computed
value,newlygeneratedsolutionreplaces the oldone.
After the local search procedure,the pheromone trail
matrix is updated.Such a pheromone updating process
reflects the usefulness of dynamic informationprovidedby
software ants.The pheromone matrix used in ant colony
optimization algorithm is a kind of adaptive memory that
contains information provided by the previously found
superior solutions and is updated at the end of the iteration.
The pheromone updating process used in this algorithm
includes best L solutions discovered by R agents at
iteration level t.This L agent mimics the real ants'
pheromone depositionbyassigningthe values of solutions.
The trail informationis updatedusingthe followingrule as
∑
=
∆+−=+
L
l
l
ijijij
tt
1
)()1()1( ττρτ
I =1,…,N j =1,…,K
where ρ is a persistence or trail and lies between [0,1] and
(1ρ) is the evaporation rate. Higher value of ρ suggests
that the information gathered in the past iterations is
forgotten faster.
The amount of is equal to,if cluster j is
assigned to ith element of the solution constructed by ant l
andzerootherwise.
An optimal solution is that solution which minimizes the
objective function value.If the value of best solution in
memory is updated with the best solution value of the
current iteration if it has a lower objective function value
than that of the best solution in memory,otherwise the best
solution in memory kept.This process explains that an
iteration of the algorithm is finished.Algorithm iterates
these steps repeatedly until a certain number of iterations
and solution having lowest function value represents the
optimal partitioning of objects of a given dataset into
several groups.
The flow chart of ant colony optimization algorithm
developed for solving dataclustering problem and
explained in detail above is shown in Figure 2.The
flowcharts of the first and second techniques proposed to
increase the performance of theACOalgorithmare shown
inFigure 3and4,respectively.
l
ij
τ∆
l
F1
TABLE 2. PHEROMONE TRAIL MATRIX GENERATEDATANY
ITERATION LEVEL OF THEACOALGORITHM FOR TEST
DATASET
N (Sample No)
1 2 3 4 5 6 7 8 9 10 F(Fitness)
S(SolutionString)
1
2 1 1 2 2 3 3 1 2 2 4.003931
2
2 3 1 2 2 3 2 3 2 2 7.172357
3
2 1 1 2 2 3 2 1 2 3 7.864054
4
2 1 3 2 2 3 2 1 2 3 8.455329
5
2 2 1 2 2 3 2 1 2 2 10.36714
6
2 1 1 2 3 3 2 1 1 3 10.92255
7
1 1 1 2 2 3 2 1 2 3 11.94087
8
2 1 1 2 1 3 2 1 1 1 12.00959
9
1 1 2 2 2 3 1 1 2 2 13.26286
10
1 1 2 2 2 3 3 1 2 3 13.33634
K (Cluster No)
1 2 3
N(SampleNo)
1
0.014756
0.015274
0.009900
2
0.015274
0.009900
0.014756
3
0.015274
0.014756
0.009900
4
0.009900
0.015274
0.014756
5
0.014756
0.015274
0.009900
6
0.009900
0.014756
0.015274
7
0.009900
0.020131
0.009900
8
0.015274
0.014756
0.009900
9
0.009900
0.015274
0.014756
10
0.014756
0.015274
0.009900
International Journal of Computer Science and Application Issue 2010
ISSN 09740767
79
Figure 2. The flow chart of ACO algorithm developed for solving data
clustering problem [3]
Figure 3: The flow chart of the first technique proposed to increase the
performance of ACO
International Journal of Computer Science and Application Issue 2010
ISSN 09740767
80
Figure 4.The flow chart of the second technique proposed to increase
the performance of ACO
Ants followthe pathbetweentheir nest andthe foodsource
according to the pheromone amount deposited on the path.
Following ants decides which path to go depending on the
pheromone concentrations on the path. After a number of
iterations ants becomes to follow continuously the same
path because of the enormous pheromone concentration
than the disused paths.This behavior of ants is called
stagnation behavior.To avoid from this disadvantage,
reference algorithm is improved with the addition of two
new techniques and the solutions are compared with each
other.First proposed technique (Figure 3) brings the
pheromone amount to initial values every 50 iteration to
avoid from stagnation behavior.Aiming minimize the
stagnationbehavior of ants,the secondproposedtechnique
(Figure 4) follows the pheromone amounts of ants and if
there is no change on the pheromone concentration of
every path after last 10 iterations,it brings the pheromone
amount toinitial values.
IV. EXPERIMENTAL EVALUATION
With the aim of generating the optimal solutions of the
presented ACO algorithm developed for solving data
clustering problem and added two new techniques,an
application program is written with “Microsoft Visual
Basic 6.0” and the program is applied on the iris database
existinginthe data warehouse of UCI [6].The iris database
consists of 150 data and it is stored in a text file.The main
screen of the application program is shown in Figure 5.
Number of iterations, clusters, agents, local search agents
and initial pheromone values,evaporation rate of
pheromone and some values needed for the algorithm are
specified in this screen.Programruns the algorithmuntil a
number of iterations.
Figure 5. The main screen of the application program
Figure 6,shows the statistical result values of these
three methods (reference algorithm and the two new
techniques) worked on the application programwith 1000
iterations.Figure 6,'1.Solution'represents our main ant
colonyoptimizationalgorithmandcomparingwiththe real
cluster values of iris database its performance is 4%,'2.
Solution'represents our proposed first technique and its
performance is 52% and'3.Solution'represents our
proposedsecondtechnique andits performance is 80%.
Figure 6. Statistical results values of the ACO methods worked with the
criterion specified on Figure 5.
International Journal of Computer Science and Application Issue 2010
ISSN 09740767
81
Figure 7. Graph screen showing the result values of the ACO methods
worked with the criterion specified in Figure 5 and the real solution.
Figure 7,shows the graph screen of these three methods
(reference algorithmand the two newtechniques) worked
on the application program with 1000 iterations and the
given criterion (see Figure 6).The straight line existing on
the graph points out the fitness value of the real cluster
values of the iris database.Curve specifying the'1.
Solution'shows theACOalgorithmresults and its working
performance derived fromcomparing with the real cluster
values is only 4% (see Figure 7),because algorithm
exposed stagnation behavior after 615th iteration (see
Figure 6).Curve specifying the'2.Solution'shows the first
proposed technique's results and its working performance
is 52% and curve specifying the'3.Solution'shows the
second proposed technique's results and its working
performance is 80%(see Figure 7).
V. CONCLUSION
In this paper we proposed two new techniques to increase
the working performance of the ant colony optimization
algorithmalgorithm.We also verified ACOalgorithmand
proposed techniques on an application program With the
comparison of these three methods,it is shown that the
proposed techniques increase the performance of the
reference ACO algorithm and the best results are derived
from the second proposed technique.Consequently,our
proposed two techniques markedly increased the success
of the ACO algorithm developed for solving the data
clusteringproblem.
REFERENCES
[1] FRAWLEY,W.J,PIATETSKYSHAPIRO,G.,MATHEUS,C.,J,
"Knowledge Discovery in Databases:An Overview",AI Magazine,
13(3):5770,1992
[2] DORIGO,M.,MANIEZZO,V.,COLORNI,A.,"The Ant System:
Optimization by a colony of cooperating agents",IEEE
Transactions on Systems,Man,and CyberneticsPart B,Vol.26,
No.1,pp.113,1996
[3] SHELOKAR,V.K.,JAYARAMAN,et.al.,"An Ant Colony
Approach for Clustering", Analytica Chimica Acta 509,187195,
2004
[4] DALKILIÇ,G.,et.al.,"Kar?nca Kolonisi Optimizasyonu",
YPBS2002  Yüksek Performansl?Bili?imSempozyumu,Kocaeli,
Ekim2002
[5] DI CARO,G.,DORIGO,M.,"Extending AntNet for Besteffort
QualityofServices Routing",Ant Workshop on Ant Colony
Optimization,htpp://iridia.ulb.ac.be/ants98/ants98.html,1516,
1998
[6] UCI Repository for Machine Learning Databases retrieved fromthe
Wor l d Wi de Web:ht t p://www.i cs.uci.edu/~ml ear n/
MLRepository.htm
[7] TSAI,C.F.,TSAI,C.W.,WU,HC,YANG,T.,"ACODF:a novel
data clustering approach for data mining in large databases",The
Journal of Systems andSoftware 73,p.133145,2004
[8] KUO,R.J.,WANG,H.S.,HUT.,CHOU,S.H.,"Application of Ant
KMeans on Clustering Analysis",Computers and Mathematics
withApplications 50,p.17091724,2005
[9] MANIEZZO,V.,et.al.,"AnAnt ApproachTo Membership Overlay
Design",ANTS 2004  Fouth International Workshop On Ant
Colony Optimization and Swarm Intelligence,p.3748,Berlin,
2004
[10] PARPINELLI,R.S.,LOPES,H.S.,FREITAS,A.A.,
"ClassificationRule Discovery with an Ant Colony Algorithm",
Encyclopedia of Information Science and Technology, Idea Group
Inc.,2005
Prof. (Dr.) Mohd Husain
Professor, Deptt. of Computer Sc. & Engineering
AZAD Institute of Engineering and Technology,
Lucknow, India
mohd.husain90@gmail.com
Mr. Raj Gaur ang Tiwar i
Assistant Professor,
Deptt. of Computer Applications
AZAD Institute of Engineering and Technology,
Lucknow, India
rajgaurang@gmail.com
Mr.Anil Agrawal
Assistant Professor,
Deptt. of Computer Sc. and Engineering
Ambalika Institute of Management and
Technology, Lucknow, India
anil19974@gmail.com
Mr. Bineet Gupta
Lecturer, Deptt. of Computer Sc. & Engineering
Mizan Tepi University, Ethiopia
bineet777@gmail.com
International Journal of Computer Science and Application Issue 2010
ISSN 09740767
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment