Expert Systems with Applications

quonochontaugskateAI and Robotics

Nov 24, 2013 (3 years and 7 months ago)

63 views

A new clustering algorithm based on hybrid global optimization based
on a dynamical systems approach algorithm
Ali Maroosi
*
,Babak Amiri
Iran University of Science and Technology,Tehran,Iran
a r t i c l e i n f o
Keywords:
Clustering
K-means
Dynamical systems
Tabu search
a b s t r a c t
Many methods for local optimization are based on the notion of a direction of a local descent at a given
point.A local improvement of a point in hand can be made using this direction.As a rule,modern meth-
ods for global optimization do not use directions of global descent for global improvement of the point in
hand.From this point of view,global optimization algorithm based on a dynamical systems approach
(GOP) is an unusual method.Its structure is similar to that used in local optimization:a new iteration
can be obtained as an improvement of the previous one along a certain direction.In contrast with local
methods,is a direction of a global descent and for more diversification combined with Tabu search.This
algorithm is called hybrid GOP (HGOP).Cluster analysis is one of the attractive data mining techniques
that are used in many fields.One popular class of data clustering algorithms is the center based clustering
algorithm.K-means is used as a popular clustering method due to its simplicity and high speed in clus-
tering large datasets.However,K-means has two shortcomings:dependency on the initial state and con-
vergence to local optima and global solutions of large problems cannot found with reasonable amount of
computation effort.In order to overcome local optima problemlots of studies have been done in cluster-
ing.In this paper,we proposed application of hybrid global optimization algorithmbased on a dynamical
systems approach.We compared HGOP with other algorithms in clustering,such as GAK,SA,TS,and ACO,
by implementing them on several simulation and real datasets.Our finding shows that the proposed
algorithm works better than others.
￿ 2010 Elsevier Ltd.All rights reserved.
1.Introduction
Clustering,so-called set partitioning,is a basic and widely ap-
plied methodology.Application fields include statistics,mathemat-
ical programming (such as location selecting,network partitioning,
routing,scheduling and assignment problems,etc.) and computer
science (including pattern recognition,learning theory,image pro-
cessing and computer graphics,etc.).Clustering is mainly to group
all objects into several mutually exclusive clusters in order to
achieve the maximumor minimumof an objective function.Clus-
tering is rapidly becoming computationally intractable as problem
scale increases,because of the combinatorial character of the
method.Brucker (1978) and Ward (1963) proved that,for specific
object functions,clustering becomes an NP-hard problem when
the number of clusters exceeds 3.
There are many methods applied in clustering analysis,like
hierarchical clustering,partition-based clustering,density-based
clustering,and artificial intelligence-based clustering.
One popular class of data clustering algorithms is the center
based clustering algorithm.K-means is used as a popular clustering
method due to its simplicity and high speed in clustering large
datasets (Forgy,1965).However,K-means has two shortcomings:
dependency on the initial state and convergence to local optima
(Selim & Ismail,1984) and also global solutions of large problems
cannot found with reasonable amount of computation effort
(Spath,1989).In order to overcome local optima problem lots of
studies have been done in clustering.
Mualik and Bandyopadhyay (2000) proposed a genetic algo-
rithm based method to solve the clustering problem and experi-
ment on synthetic and real life datasets to evaluate the
performance.The results showed that GA-based method might im-
prove the final output of K-means.
Krishna and Murty (1999) proposed a novel approach called ge-
netic K-means algorithm for clustering analysis.It defines a basic
mutation operator specific to clustering called distance-based
mutation.Using finite Markov chain theory,it proved that GKA
converge to the best-known optimum.
Selim and Al-Sultan (1991) discussed the solution of the clus-
tering problem usually solved by the K-means algorithm.The is
problem known to have local minimum solutions,which are
0957-4174/$ - see front matter ￿ 2010 Elsevier Ltd.All rights reserved.
doi:10.1016/j.eswa.2010.02.047
* Corresponding author.
E-mail addresses:Ali.Maroosi@gmail.com,Ali_maroosi@ee.iust.ac.ir (A.Maroo-
si),Amiri_babak@ind.iust.ac.ir (B.Amiri).
Expert Systems with Applications 37 (2010) 5645–5652
Contents lists available at ScienceDirect
Expert Systems with Applications
j ournal homepage:www.el sevi er.com/l ocat e/eswa
usually what the K-means algorithm obtains.The simulated
annealing approach for solving optimization problems described
and proposed for solving the clustering problem.The parameters
of the algorithm were discussed in detail and it was shown that
the algorithm converges to a global solution of the clustering
problem.
According to Sung and Jin (2000),researchers considered a clus-
tering problem where a given data set partitioned into a certain
number of natural and homogeneous subsets such that each subset
is composed of elements similar to one another but different from
those of any other subset.For the clustering problem,a heuristic
algorithm exploited by combining the Tabu search heuristic with
two complementary functional procedures,called packing and
releasing procedures.The algorithm was numerically tested for
its electiveness in comparison with reference works including the
Tabu search algorithm,the K-means algorithm and the simulated
annealing algorithm.
Over the last decade,modeling the behavior of social insects,
such as ants and bees,for the purpose of search and problemsolv-
ing has been the context of the emerging area of swarm intelli-
gence.Using ant colony is a typical successful swarm-based
optimization approach,where the search algorithm is inspired by
the behavior of real ants.
Kuo,Wang,Hu,and Chou (2005) proposed a novel clustering
method,ant K-means (AK) algorithm.Ant K-means algorithmmod-
ifies the K-means as locating the objects in a cluster with the prob-
ability,which updated by the pheromone,while the rule of
updating pheromone is according to total within-cluster variance
(TWCV).
Shelokar,Jayaraman,and Kulkarni (2004) presents an ant col-
ony optimization,methodology for optimally clustering N objects
into K clusters.The algorithmemploys distributed agents who mi-
mic the way real ants find a shortest path from their nest to food
source and back.They compared result with other algorithms in
clustering,GA,Tabu search,SA.They showed that their algorithms
are better than other algorithms in performance and time.
This paper presents applicationof HGOPalgorithmfor clustering.
The paper is organized as follows:in Section 2 we discussed cluster
analysis problems.Section3introduces HGOPphilosophyandappli-
cationof it onclustering,andtheninSection4experimental result of
proposed clustering algorithmin comparison with other clustering
algorithms is shown.
2.Clustering
Data clustering,which is an NP-complete problem of finding
groups in heterogeneous data by minimizing some measure of
dissimilarity,is one of the fundamental tools in data mining,
machine learning and pattern classification solutions (Garey,
Johnson,& Witsenhausen,1982).Clustering in N-dimensional
Euclidean space RN is the process of partitioning a given set of
n points into a number,say k,of groups (or,clusters) based on
some similarity (distance) metric in clustering procedure which
is Euclidean distance,derived from the Minkowski metric (Eqs.
(1) and (2)).
dðx;yÞ ¼
X
m
i¼1
jx
i
y
j
j
r
!
1=r
ð1Þ
dðx;yÞ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
m
i¼1
ðx
i
y
j
Þ
v
u
u
t
2
ð2Þ
Let the set of n points fX
1
;X
2
;...;X
n
g be represented by the set S
and the k clusters be represented by C
1
;C
2
...;C
K
.Then:
C
i
–/for i ¼ 1;...;k;
C
i
\C
j
¼/for i ¼ 1;...;k;j ¼ 1;...;k;and i –j
and [
K
i¼1
C
i
¼ S
In this study,we will also use Euclidian metric as a distance metric.
The existing clustering algorithms can be simply classified into the
following two categories:hierarchical clustering and partitional
clustering.The most popular class of partitional clustering methods
are the center based clustering algorithms (Gungor & Unler,2006).
The K-means algorithms,is one of the most widely used center
based clustering algorithms (Forgy,1965).To find K centers,the
problem is defined as an optimization (minimization) of a perfor-
mance function,f ðX;ZÞ,defined on both the data items and the cen-
ter locations.A popular performance function for measuring
goodness of the k clustering is the total within-cluster variance or
the total mean-square quantization error (MSE),Eq.(3) (Gungor &
Unler,2006).
f ðX;ZÞ ¼
X
N
i¼1
MinfkX
i
Z
l
k
2
j l ¼ 1;...;K:g ð3Þ
The steps of the K-means algorithmare as follow(Mualik & Bandyo-
padhyay,2000):
Step 1:Choose K cluster centers Z
1
;Z
2
;...;Z
k
randomly fromn
points fX
1
;X
2
;...;X
n
g.
Step 2:Assign point X
i
;i ¼ 1;2;...;n to cluster
C
j
;J 2 f1;2;...;Kg if kX
i
Z
j
k < kX
i
Z
p
k,p=1,2,...,K,and
j –p.
Step 3:Compute new cluster centers Z

1
;Z

2
;...;Z

K
as follows:
Z

i
¼
1
n
X
x
j
2C
i
X
j
;i ¼ 1;2;...;K;
where n
i
is the number of elements belonging to cluster C
i
.
Step 4:If termination criteria are satisfied,stop otherwise con-
tinues from step 2
Note that in case the process does not terminate at step 4 nor-
mally,then it is executed for a mutation fixed number of iterations.
Global optimization algorithm (GOP)
Steps of the global optimization algorithm are as follow:
1.Select some initial random points that is define a set
A ¼ fUðtÞ ¼ ðU1ðtÞ;...;UnðtÞÞ;t ¼ 1;...;Tg;the set A uniformly
select from the box U 2 R
Kd
;a
i
6 u
i
6 b
i
;i ¼ 1;...;Kd;.
2.Calculate the performance value at set A and then chose a point
U

2 Awhich provides the best performance value and U
T
¼ U

.
3.Find good point U
Tþ1
from U

ðU
T
Þ and add it to the set A.For
each point U 2 A and each coordinate i calculate degree of the
change of objective function values f(U) when u
i
changes.Here
change means either decrease or increase of a scalar variable.
4.U
T
¼ U

that U

¼ ðu

0
;u

1
;...;u

n
Þ,calculate Fðu
i
"Þ Fðu
i
#Þ.
5.Using these degrees,calculate forces FðTÞ ¼ ðF
1
ðTÞ;...;F
KL
ðTÞÞ
acting on increase of objective function values f at the point
U

ðU
T
Þ.FðTÞ ¼ Fðu
i
"Þ Fðu
i
#Þ.
6.Calculate U
Tþ1
from U
T
by U
Tþ1
¼ U
T
þ
a
FðTÞ.
7.Then by the same manner,choose a new point U
Tþ2
and so on.
8.This process is terminated if either FðTÞ ¼ 0 or after maximum
iteration hold the best solution and repeat all of the these pro-
cedures with take other initial randompoints and another stage
again.
5646 A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652
3.Application of GOP in clustering
3.1.Philosophy behind the development of the GOP
There are many different methods and algorithms developed for
global optimization problems (Migdalas,Pardalos,& Varbrand,
2001).The algorithms GOP takes into account some relatively
”worse” points for further consideration.This is what many other
methods do,such as Simulated,Annealing (Glover & Laguna,
1997),Genetic Algorithms (Smith,2002) and Taboo Search
(Cvijovic & Klinovski,2002).The choice of a decent (good) direction
is the main part of each algorithm.Instead of using a stochastic
search (as in the algorithms mentioned),GOP uses the formula
(Mammadov,2004;Mammadov,Rubinov,& Yearwood,2005).
Note that the GOP algorithm has quite different settings and
motivations compared to the methods that use so-called dynami-
cal search (Pronzato,Wynn,& Zhigljausky,2002).This method
for a search has some ideas in common with the heuristic method
which attempts to estimate the overall convexity characteristics of
the objective function.The advantage of this approach is that it
does not use any approximate underestimations including convex
underestimations.
Fig.1.Flowchart of the HGOP clustering algorithm.
A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652 5647
3.2.Application of hybrid GOP algorithm on clustering
The search capability of HGOP algorithmused in this article for
the purpose of appropriately determining a fixed number of K clus-
ter centers in R
N
;thereby suitably clustering the set of n unlabelled
points the clustering metric that has been adopted is the sum of
the Euclidean distance of the points from their respective cluster
centers.The steps of the proposed algorithm are shown in Fig.1
and described in detail in this section.The pseudo code for HGOP
clustering algorithm is shown in Fig.2.Steps of application of
the HGOP on clustering are as follows:
Step 0.f ðU
Total
Best
Þ þ1
Num
Main
Iter ¼ 0;
TabuListðTLÞ ¼/;
Step 1.Select some initial random points that is define a set
A ¼ fUðtÞ ¼ ðU
1
ðtÞ;...;U
n
ðtÞÞ;t ¼ 1;...;Tg;the set A is
uniformly selected from the box U 2 R
Kd
;a
i
6 u
i
6 b
i
;
i ¼ 1;...;Kd;and does not belong to the Tabu Region
(Visited Region),where a
i
;b
i
are minimum and maxi-
mumbound of u
i
that set with minimumand maximum
of data that should be clustered.K is the number of clus-
ters and d is the number of dimensions of points and
Tabu region (TR) determined by Tabu List.
Step 2.If TL ¼ fV
1
;V
2
;...;V
M
 then TR ¼ fUjjU V
i
j <
q
i
;i 2
f1;2;...;Mg such that
q
i
is radius of spherical area with
center V
i
that is percent of
C
ð
C
¼ maxðb
i
a
i
Þ that
i ¼ 1;...;KdÞ.The i th point is represented as a vector of
decision variable values UðiÞ ¼ ðu
1
i
;u
2
i
;...;u
Kd
i
Þ that is a
candidate solution with K cluster centers.For example
Uð1Þ ¼ ð2;5;1;6;3;2;5;7;4Þ represents a solution with
3 cluster centers that are Z
1
¼ ð2;5;1Þ;Z
2
¼ ð6;3;
2Þ;Z
3
¼ ð5;7;4Þ,such that each center has three
dimensions.
Step 3.Compute the performance value of f ðiÞ for each point
UðiÞ.Note that to prevent the difference between
U
1
¼ ðZ
1
;Z
2
;Z
3
Þ and U
1
¼ ðZ
2
;Z
1
;Z
3
Þ after selecting
initial randompoints swap Z
i
to overcome this problem.
Step 4.Calculate the performance value at set A and then choose
U

as U

2 A,which it provides the best performance
value and then put U

in U
T
.
Step 5.Find good point U
Tþ1
fromU

ðU
T
Þ and add it to the set A
and update Tabu List as follow:
Step 5-1.For each point U 2 A and each coordinate i calculate
degree of the change of f(U) when u
i
changes.Here change
means either decrease or increase of a scalar variable.If
U
T
¼ U

¼ ðu

0
;u

1
;...;u

n
Þ points that force FðTÞ was calcu-
lated for this point and U ¼ ðu
0
;u
1
;...;u
n
Þ was represented,
the points exist.This method describes dynamical systems,
based on the nonfunctional relationship between two vari-
ables.It is based on the notion of a fuzzy derivative.The
fuzzy derivative @u
i
=@u
j
is defined as an influence of the fea-
ture j on the feature i that
@u
i
@u
j
¼
l
ðu
i
;u
j
Þ;i;j 2 1;...;n
where
l
ðu
i
;u
j
Þ ¼ ð
l
1
ðu
i
;u
j
Þ;
l
2
ðu
i
;u
j
Þ;
l
3
ðu
i
;u
j
Þ;
l
4
ðu
i
;u
j
ÞÞ;i;j
2 f0;...;ng and i –j
and
l
1
ðu
i
;u
j
Þ ¼dðu
i
";u
j
"Þ;
l
2
ðu
i
;u
j
Þ ¼dðu
i
";u
j
#Þ;
l
3
ðu
i
;u
j
Þ ¼
dðu
i
#;u
j
#Þ;
l
4
ðu
i
;u
j
Þ ¼ dðu
i
#;u
j

l
1
ðu
i
;u
j
Þ shows the degree
of the increase of the entry i if the entry j increases starting
from the initial state ðu
i
;u
j
Þ and so as for
l
2
ðu
i
;u
j
Þ;
l
3
ðu
i
;u
j
Þ
and
l
4
ðu
i
;u
j
Þ
l
1
¼
M
11
M
1
;
l
1
¼
M
12
M
1
;
l
1
¼
M
13
M
2
;
l
1
¼
M
14
M
2
where M
1
is the number of points ðu
i
;u
j
Þ that satisfy u
i
> u

i
.
M
11
is the number of points ðu
i
;u
j
Þ that satisfy u
i
> u

i
and
u
j
> u

j
.M
12
is the number of points ðu
i
;u
j
Þ that satisfy
u
i
> u

i
and u
j
< u

j
.M
2
is the number of points ðu
i
;u
j
Þ that
satisfying u
i
< u

i
.M
13
is the number of points ðu
i
;u
j
Þ that sat-
isfy u
i
< u

i
and u
j
< u

j
.M
14
is the number of points ðu
i
;u
j
Þ
that satisfy u
i
< u

i
and u
j
> u

j
.
l
ðu
i
;u
j
Þ ¼ ð
g
1
;
g
2
;
g
3
;
g
4
Þ and
l
ðu
j
;u
i
Þ ¼ ð
1
1
;
1
2
;
1
3
;
1
4
Þ then Fðu
j
!u
i
"Þ ¼
g
1
1
1
þ
g
4
1
2
and
Fðu
j
!u
i
#Þ ¼
g
3
1
3
þ
g
2
1
4
.The resulting forces on the entry
i is defined as a sum of all these forces that is,
Fðu
i
"Þ ¼
P
i – j
Fðu
j
!u
i
"Þ;Fðu
i
#Þ ¼
P
i – j
Fðu
j
!u
i

Step 5-2.Using these degrees,calculate forces FðTÞ ¼
ðF
1
ðTÞ;...;F
KL
ðTÞÞ acting to increase amount of f at the point
of U

ðU
T
Þ.FðTÞ ¼ Fðu
i
"Þ Fðu
i

Step 5-3.Calculate
b
U
Tþ1
by using UT.
set
a
¼ 1 and U
Dynamical
System
¼ U
T
while
a
>
e
and Iteration<Max_Num_Iter_Dynamical_
System
If f ðU
Dynamical
System
þ
a
FðTÞÞ < f ðU
Dynamical
System
Þ
Fig.2.Pseudo code for HGOP clustering algorithm.
5648 A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652
U
Dynamical
System
¼ U
Dynamical
System
þ
a
FðTÞ
else
a
¼
a
=2
Iteration= Iteration+1;
b
U
Tþ1
¼ U
Dynamical
System
In above
e
is a small positive number.
Step 5-4.Refinement of a new point:Using a local optimiza-
tion procedure starting in
b
U
Tþ1
to find a local minimumU
Tþ1
.A direct search method (Pattern Search,(Hart,2001)) is
used at this stage.
Step 5-4-1.Set u
0
¼
b
U
Tþ1
;
D
p
¼
D
0
.
Step 5-4-2.for p ¼ 0;1;...,number of pattern iteration,
we have an iterate u
p
and a step-length parameter
D
p
> 0 Let ei;i ¼ 1;...;n,denote the standard unit basis
vectors.
Step 5-4-3.Look at the points u
pattern
¼ u
p

D
p
e
i
;i ¼
1;...;n to find u
pattern
for which f ðu
pattern
Þ < f ðu
p
Þ.If you
find no u
pattern
such that f ðu
pattern
Þ < f ðu
p
Þ,then reduce
D
p
by a half and continue;otherwise,leave the step-
length parameter alone,setting
D
pþ1
¼
D
p
and u
pþ1
¼
u
pattern
.In the latter case we can also increase the step-
length parameter,say,by a factor of 2,if we feel a longer
step might be justified.
Step 5-4-4.Repeat the iteration just described until
D
p
is
deemed sufficiently small.
Step 5-4-5.At the end U
Tþ1
¼ u
numberofpatterniteration
Step 5-5.Construct a newpopulation Aðt þ1Þ ¼ AðtÞ[fU
Tþ1
g.
Step 5-6.Update Tabu List with U
Tþ1
,if Tabu List is full,
delete the point with worse ranking.Tabu Lists are ranked
and saved according to their objective function values and
Sum of the Euclidian distance of each point from another
point in the list.
5-6-1.Save ranking of Tabu List according to the objec-
tive function values and name it Objective Function Rank
(OFR).
5-6-2.Save ranking of Tabu List according to the maxi-
mumsumof Euclidian distance values fromthe tabu list
points and name it Distance Rank (DR).
5-6-3.Ranking of each point in TL is best ranking in DR
and OFR.
Step 6.If either FðTÞ <
c
or T > T
*
;go to step 7,otherwise,return
to step 4 where
c
> 0 and T
*
is a positive integer.
Step 7.If f ðU
Total
Best
Þ > f ðU
Tþ1
Þ then U
Total
Best
¼ U
Tþ1
else
U
Total
Best
¼ U
Total
Best
.If Num_Main_Iter is less than Max_-
Num_Main_Iter then Num_Main_Iter = Num_Main_Iter
+1;and return to step 1.
Step 8.The U
Total
Best
is final solution and Cluster center.
4.Experimental result
The experimental results comparing the HGOP clustering algo-
rithmwith several typical algorithms including the ACO algorithm
(Shelokar et al.,2004),the simulated annealing approach (Selim&
Al-Sultan,1991),the genetic K-means algorithm(Krishna & Murty,
1999),and the Tabu search approach (Sung & Jin,2000) are pro-
vided for four artificial data sets (Data 1,Data 2,Data 3 and Data
4) and five real-life data sets (Vowel,Iris,Crude Oil,Wine and Thy-
roid diseases data),respectively.These are first described below.
The effectiveness of algorithms is greatly dependent on the gener-
ation of initial solutions.Therefore,for every dataset,algorithms
performed 10 times individually for their own effectiveness tests,
each time with randomly generated initial solutions.We have done
our experiments on a PentiumIV,2.8 GHz,512 GB RAMcomputer
and we have coded with Matlab 7.1 software.We run all five algo-
rithms on datasets.
4.1.Artificial data sets
Data 1:This is a nonoverlapping two-dimensional data set
where the number of clusters is two.It has 10 points.The value
of K is chosen to be 2 for this data set.
Data 2:This is a nonoverlapping two-dimensional data set
where the number of clusters is three.It has 76 points.The value
of K is chosen to be 3 for this data set.
Data 3:This is an overlapping two-dimensional triangular dis-
tribution of data points having nine classes where all the classes
are assumed to have equal a priori probabilities ( = 1/19 ).It has
900 data points.The X-Y ranges for the nine classes are as follows:
Class 1:[3.3,0.7]  [0.7,3.3],
Class 2:[1.3,1.3]  [0.7,3.3],
Class 3:[0.7,3.3]  [0.7,3.3],
Class 4:[3.3,0.7]  [1.3,1.3],
Class 5:[1.3,1.3]  [1.3,1.3],
Class 6:[0.7,3.3]  [1.3,1.3],
Class 7:[3.3,0.7]  [3.3,0.7],
Class 8:[1.3,1.3]  [3.3,0.7],
Class 9:
[0.7,3.3]  [3.3,0.7].
Thus the domain for the triangular distribution for each class
and for each axis is 2.6.Consequently,the height will be 1/1.3
(since 12
*
2.6
*
height”1).The value of K is chosen to be 9 for this
data set.
Data 4:This is an overlapping ten-dimensional data set gener-
ated using a triangular distribution of the formshown in Fig.3 for
two classes,1 and 2.It has 1000 data points.The value of Kis chosen
to be 2 for this data set.The range for class 1 is [0,2]  [0,2]  [0,
2]...10 times,and that for class 2 is [1,3] [0,2] [0,2]...9 times,
with the corresponding peaks at (1,1) and (2,1).The distribution
along the first axis (X) for class 1 may be formally quantified as:
f
1
ðxÞ ¼
0 for x 6 0
x for 0 < x 6 1
2 x for 1 < x 6 2
0 for x > 2
8
>
>
>
<
>
>
>
:
for class 1.Similarly for class 2
Fig.3.Triangular distribution along the X-axis.
A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652 5649
f
1
ðxÞ ¼
0 for x 6 1
x 1 for 1 < x 6 2
3 x for 2 < x 6 3
0 for x > 3
8
>
>
>
<
>
>
>
:
The distribution along the other nine axes ðY
i
;i ¼ 1;2;...;9Þ for
both the classes is
f
1
ðxÞ ¼
0 for y
i
6 0
y
i
for 0 < y
i
6 1
2 y
i
for 1 < y
i
6 2
0 fory
i
> 2
8
>
>
>
<
>
>
>
:
4.2.Real-life data sets
Vowel data:This data consists of 871 Indian Telugu vowel
sounds (Pal & Majumder,1977).These were uttered in a
consonant–vowel–consonant context by three male speakers in
the age group of 30–35 years.The data set has three features
F1,F2 and F3,corresponding to the first,second and third vowel
formant frequencies,and six overlapping classes fd;a;i;u;e;og.
The value of K is therefore chosen to be 6 for this data.
Iris data:This is the Iris data set,which is perhaps the best-
known database to found in the pattern recognition literature.
Fisher’s paper is a classic in the field and referenced frequently
to this day.The data set contains three classes of 50 instances each,
where each class refers to a type of iris plant.One class is linearly
separable from the other 2;the latter are not linearly separable
from each other.There are 150 instances with four numeric attri-
butes in iris data set.There is no missing attribute value.The attri-
butes of the iris data set are;sepal length in cm,sepal width in cm,
petal length in cm and petal width in cm (Blake and Merz).
Crude oil data:This overlapping data (Johnson & Wichern,1982)
has 56 data points,5 features and 3 classes.Hence the value of K is
chosen to be 3 for this data set.
Wine data:This is the wine data set,which is also taken from
MCI laboratory.These data are the results of a chemical analysis
of wines grown in the same region in Italy but derived fromthree
different cultivars.The analysis determined the quantities of 13
constituents found in each of the three types of wines.There are
178 instances with 13 numeric attributes in wine data set.All attri-
butes are continuous.There is no missing attribute value.
Thyroid diseases data:This dataset categories N = 215 samples of
patients suffering from three human thyroid diseases,K = 3 as:
euthyroid,hyperthyroidism,and hypothyroidism patients where
150 individuals tested euthyroid thyroid,30 patients experienced
hyperthyroidismthyroid while 35 patients suffered fromhypothy-
roidismthyroid.Each individual was characterized by the result of
five,n= 5 laboratory tests as:total serumthyroxin,total serumtri-
iodothyronine,serum tri-iodothyronine resin uptake,serum thy-
roid-stimulating hormone (TSH),and increase TSH after injection
of TSH-releasing hormone (Blake and Merz).
The comparison of results for each dataset is based on the bet
solution found in 10 distinct runs of each algorithm and the con-
vergence processing time taken to attain the best solution.The
solution quality is also given in terms of the average and worst
Table 5
Result obtained by the five algorithms for 10 different runs on Vowel data.
Method Function value CPU time (s)
F
best
F
average
F
worst
HGOP 148718.363754 148718.454321 148718.674567 69.85
ACO 148837.736634 148837,768828 148837.937878 73.65
GAK 149346.152274 149391,501798 149436.851323 98.72
TS 150635.653256 1506480,795320 150697.784636 81.25
SA 149357.634587 1494360,175420 149749.549362 79.46
Table 1
Result obtained by the five algorithms for 10 different runs on dataset 1.
Method Function value CPU time (s)
F
best
F
a
v
erage
F
worst
HGOP 3,120125 3,131337 3,228137 1.81
ACO 3,142375 3,163422 3,352843 1.89
GAK 3,273426 3,355521 3,683901 2.01
TS 3,244326 3,310024 3,572814 1.92
SA 3,217832 3,282089 3,539115 1.99
Table 2
Result obtained by the five algorithms for 10 different runs on dataset 2.
Method Function value CPU time (s)
F
best
F
a
v
erage
F
worst
HGOP 51.493674 51,533427 51.687453 8.23
ACO 52.082746 52,212071 52.729373 8.98
GAK 56,142562 56,377520 57,317354 17.24
TS 54.752946 54,879342 55.384927 14.57
SA 53.562492 53,635943 53.929748 14.82
Table 3
Result obtained by the five algorithms for 10 different runs on dataset 3.
Method Function value CPU time (s)
F
best
F
a
v
erage
F
worst
HGOP 962.342786 962.578234 964.753761 25.93
ACO 964.739472 965,048327 966.283745 26.88
GAK 966,649837 966,772302 966,853946 38.52
TS 972.629478 973,209275 975.528463 32.78
SA 966.418263 966,614089 967.397392 31.24
Table 4
Result obtained by the five algorithms for 10 different runs on dataset 4.
Method Function value CPU time (s)
F
best
F
a
v
erage
F
worst
HGOP 1246.135426 1246,325342 1246.374356 120.63
ACO 1248.958685 1249,034036 1249.335442 122.34
GAK 1258.673362 1520,777767 1271.635528 178.42
TS 1282.538294 1285,988483 1299.789237 142.15
SA 1249.736287 1249,968105 1250.895375 136.61
Table 6
Result obtained by the five algorithms for 10 different runs on Iris data.
Method Function value CPU time (s)
F
best
F
average
F
worst
HGOP 96.370352 96.373654 96.387564 31.35
ACO 97.100777 97.171546 97.808466 33.72
GAK 113.986503 125.197025 139.778272 105.53
TS 97.365977 97.868008 98.569485 72.86
SA 97.100777 97.134625 97.263845 95.92
5650 A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652
values of the clustering metric (F
avg
;F
worst
,respectively) after 10
different runs for each of the five algorithms.F is the performance
of clustering method that illustrated in Eq.(3).Tables 1–9 show
these results.
For Data 1 (Table 1) it is found that the HGOP clustering algo-
rithm provides the optimal vale of 3,120125 in 90% of the total
runs that is better than other clustering algorithms.The ACO clus-
tering algorithm found value of 3,142375 in 90% of runs and GAK,
TS,and SA found values of 3,273426,3,244326,and 3,217832 in
80% of runs.The HGOP required the least processing time (1.81).
For Data 2 (Table 2) the HGOP clustering algorithmattains the best
value of 51.493674 in 90% of the total runs.On the other hand ACO,
GAK,TS and SA algorithms attain 52,082746,56,142562,
54,752946,and 53,562492 in 80% of the total runs.The execution
time taken by the HGOP algorithm is less than other algorithms
(8.23).Similarly for Data 3 (Table 3) and Data 4 (Table 4) the best
HGOP clustering algorithm attains the best values of 962.342786
and 1246.135426 in 90% and all of total runs,respectively.The best
value provided by ACO,TS and SA obtained in 80% of total runs and
the best value provided by GAK obtained in 40% of runs.In terms of
the processing time,the HGOP performed better than other clus-
tering algorithms as can be observed from Tables 3 and 4.
For Vowel Data,(Table 5) the HGOP clustering algorithmattains
the best value of 148718.363754 in 90% of runs.ACO,TS,and SA
provided the best values in 80% of runs and the GAK algorithmat-
tains the best value only in 50% of total runs.In addition the HGOP
clustering algorithm performed better than other algorithms in
terms of the processing time required (69.85).For clustering prob-
lem,on iris dataset results given in Table 6,show that the HGOP
provide the optimumvalue of 96.370352.The HGOP and ACOwere
able to find the optimum nine times as compared to that of five
times obtained by SA.The HGOP required the least processing time
(31.35).
For Crude Oil data set,the HGOP clustering algorithm attains
the best value of 251.534997 in 90% of total runs and ACO,GAK,
TS,and SA attain the best value of 253.564637,278,965152,
254.645375,and 250.983245 in 80% of total runs.The processing
time required by HGOP is less than other algorithms (14.43).
The result obtained for the clustering problem,Wine dataset gi-
ven in Table 8.The HGOP find the optimum solution of
16228.645326 and the ACO,SA and GAK methods provide
16530.533807.The HGOP,ACO,SA and GAK methods found the
optimum solution in all their 10 runs.The execution time taken
by the HGOP algorithm is less than other algorithms.
Table 7
Result obtained by the five algorithms for 10 different runs on Crude oil data.
Method Function value CPU time (s)
F
best
F
average
F
worst
HGOP 250.983245 251.243564 252.028164 14.43
ACO 253.564637 254,180897 256.645938 14.98
GAK 278,965152 279,907028 283.674535 35.26
TS 254.645375 2554229528 258.533264 26.55
SA 253.763548 254,653207 258.211847 24.74
Table 8
Result obtained by the five algorithms for 10 different runs on Wine data.
Method Function value CPU time (s)
F
best
F
average
F
worst
HGOP 16228.645326 16228.645326 16228.645326 56.37
ACO 16530.533807 16530.533807 16530.533807 68.29
GAK 16530.533807 16530.533807 16530.533807 226.68
TS 16666.226987 16785.459275 16837.535670 161.45
SA 16530.533807 16530.533807 16530.533807 57.28
Table 9
Result obtained by the five algorithms for 10 different runs on Thyroid data.
Method Function value CPU time (s)
F
best
F
average
F
worst
HGOP 10109.874563 10111.132455 10113.657348 94.34
ACO 10111.827759 10112.126903 10114.819200 102.15
GAK 10116.294861 10128.823145 10148.389608 153.24
TS 10249.72917 10354.315021 10438.780449 114.01
SA 10111.827759 10114.045265 10115.934358 108.22
Table 10
Values of parameters of each of five algorithms.
HGOP ACO GAK TS SA
Parameter Value Parameter Value Parameter Value Parameter Value Parameter Value
Number of max Num_Main_Iter 15 Ants (R) 50 Population size 50 Tabu list size 25 Probability
threshold
0.98
Positive small integer
c
.01 Probability threshold
for maximumtrail (q0)
0.98 Crossover rate 0.8 Number of trial
solutions
40 Initial temperature 5
Positive small integer
e
.0005 Local search
probability (pls)
0.01 Mutation rate 0.001 Probability
threshold
0.98 Temperature
multiplier
0.98
Number of iteration T
*
40 Evaporation rate ð
q
Þ 0.01 Maximum
number of
iterations
1000 Maximum
number of
iterations
1000 Final temperature 0.01
step-lengthof pattern search
D
0
.5 Maximum number of
iterations (itermax)
1000 Number of iterations
detect steady state
100
Number of initial point 20 Maximum number
of iterations
30,000
number of pattern iteration 20
Max_Num_Iter_Dynamical_System 40
radius of tabu region
q
i
.2
C
Tabu List 20
A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652 5651
The HGOP algorithm for the human thyroid disease dataset,
provides the optimum solution of 10109.874563 to this problem
with success rate of 90% during 10 runs.In termof the processing
time the HGOP performed better than other clustering algorithms
as can be observed from Table 9.
Shelokar et al.(2004) performed several simulations to find the
algorithmic parameters that result into the best performance of
ACO,GAK,SA and TS algorithms in terms of the equality of solution
found,the function evaluations and the processing time required.
In this study,we used their algorithmic parameters.In addition,
we performed several simulations to find the algorithmic parame-
ters for HGOP algorithm.Algorithmic parameters for all algorithms
are illustrated in Table 10.
The result illustrate that the proposed HGOP optimization ap-
proach can be considered as a viable and an efficient heuristic to
find optimal or near optimal solutions to clustering problems of
allocating N objects to k clusters.As mentioned later,final solu-
tion in K-means algorithm is sensitive to the initial population.
In proposed HGOP algorithm initial solutions and individual solu-
tions are not important and exchange of information among dif-
ferent individual solutions causes the proposed algorithm to
find the global solution and actually over come the K-means
shortcoming.
5.Conclusion
In summary,in this paper the hybrid HGOP algorithmis used to
solve clustering problems.The HGOP algorithms use the notion of
relationship between variables that describes influences of the
changes of the variables to each other.The HGOP algorithm takes
into account some relatively worse points for further consider-
ation.This is what other methods do,such as Simulated Annealing,
Genetic Algorithms and Taboo Search.The HGOP algorithm at-
tempts to jump over local minimumpoints and tries to find deeper
points.In this paper the global optimum is combined with the
Tabu search to solve the problem of revisiting the visited region.
This hybridization makes the algorithm to be faster.The HGOP
algorithm for data clustering can be applied when the number of
clusters are known a priori and are crisp in nature.To evaluate
the performance of the HGOP algorithm,it is compared with other
stochastic algorithms viz.ant colony,genetic algorithm,simulated
annealing and Tabu search.The algorithm is implemented and
tested on several simulation and real datasets;preliminary compu-
tational experience is very encouraging in terms of the quality of
solution found and the processing time required.
References
Blake,C.L.,& Merz,C.J.UCI repository of machine learning databases.Available
from:<http://www.ics.uci.edu/_mlearn/MLRepository.html>.
Brucker,P.(1978).On the complexity of clustering problems,optimization and
operation research.Lecture Notes in Economics and Mathematical Systems,157,
45–54.
Cvijovic,D.,& Klinovski,J.(2002).Taboo search:An approach to the
multipleminima problem for continuous functions.In P.Pardalos & H.
Romeijn (Eds.).Handbook of Global Optimization (Vol.2).Kluwer Academic
Publishers.
Forgy,E.W.(1965).Cluster analysis of multivariate data:Efficiency versus
interpretability of classifications.Biometrics,21(3),768–769.
Garey,M.R.,Johnson,D.S.,& Witsenhausen,H.S.(1982).The complexity of the
generalized Lloyd–Max problem.IEEE Transactions on Information Theory,28(2),
255–256.
Glover,F.,& Laguna,M.(1997).Taboo search.Kluwer Academic Publishers.
Gungor,Z.,& Unler,A.(2006).K-harmonic means data clustering with simulated
annealing heuristic.Applied Mathematics and Computation.
Hart,W.E.(2001).A convergence analysis of unconstrained and bound constrained
evolutionary pattern search.Evolutionary Computation,9(1),1–23.
Johnson,R.A.,& Wichern,D.W.(1982).Applied multivariate statistical analysis.
Englewood Clifs,NJ:Prentice-Hall.
Krishna,K.,& Murty (1999).Genetic K-means algorithm.IEEE Transaction on
Systems,Man,and Cybernetics – Part B:Cybernetics,29,433–439.
Kuo,R.I.,Wang,H.S.,Hu,Tung-Lai,& Chou,S.H.(2005).Application of ant K-means
on clustering analysis.Computers and Mathematics with Applications,50,
1709–1724.
Mammadov,M.A.(2004).A new global optimization algorithm based on a
dynamical systems approach.In Proceedings of the 6th international conference
on optimization:techniques and applications.Ballarat,Australia.
Mammadov,M.A.,Rubinov,A.M.,& Yearwood,J.(2005).Dynamical systems
described by relational elasticities with applications to global optimization.In
V.Jeyakumar & A.Rubinov (Eds.),Continuous optimisation:Current trends and
applications (pp.365–387).Springer.
Migdalas,A.,Pardalos,P.,& Varbrand,P.(2001).From local to global optimization.
Nonconvex Optimization and Its Applications (Vol.53).Kluwer Academic
Publishers.
Mualik,U.,& Bandyopadhyay,S.(2000).Genetic algorithm-based clustering
technique.Pattern Recognition,33,1455–1465.
Pal,S.K.,& Majumder,D.D.(1977).Fuzzy sets and decision making approaches in
vowel and speaker recognition.IEEE Transactions on Systems,Man,and
Cybernetics,SMC-7,625–629.
Pronzato,L.,Wynn,H.,& Zhigljausky,A.A.(2002).An introduction to dynamical
search.In P.Pardalos & H.Romeijn (Eds.).Handbook of global optimization (Vol.
2).Kluwer Academic Publishers.
Selim,S.Z.,& Al-Sultan,K.(1991).A simulated annealing algorithm for the
clustering problem.Pattern Recognition,24(10),1003–1008.
Selim,S.Z.,& Ismail,M.A.(1984).K-means type algorithms:a generalized
convergence theoremand characterization of local optimality.IEEE Transactions
on Pattern Analysis and Machine Intelligence,6,81–87.
Shelokar,P.S.,Jayaraman,V.K.,& Kulkarni,B.D.(2004).An ant colony approach for
clustering.Analytica Chimica Acta,509,187–195.
Smith,J.(2002).Genetic algorithms.In P.Pardalos & H.Romeijn (Eds.).Hand book of
Global Optimization
(Vol.2).Kluwer Academic Publishers.
Spath,H.
(1989).Clustering Analysis Algorithms.Chichester,UK:Ellis Horwood.
Sung,C.S.,& Jin,H.W.(2000).A tabu-search-based heuristic for clustering.Pattern
Recognition,33,849–858.
Ward,J.W.(1963).Hierarchical grouping to optimize an objective function.Journal
of the American Statistical Association,58,236–244.
5652 A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652