A new clustering algorithm based on hybrid global optimization based

on a dynamical systems approach algorithm

Ali Maroosi

*

,Babak Amiri

Iran University of Science and Technology,Tehran,Iran

a r t i c l e i n f o

Keywords:

Clustering

K-means

Dynamical systems

Tabu search

a b s t r a c t

Many methods for local optimization are based on the notion of a direction of a local descent at a given

point.A local improvement of a point in hand can be made using this direction.As a rule,modern meth-

ods for global optimization do not use directions of global descent for global improvement of the point in

hand.From this point of view,global optimization algorithm based on a dynamical systems approach

(GOP) is an unusual method.Its structure is similar to that used in local optimization:a new iteration

can be obtained as an improvement of the previous one along a certain direction.In contrast with local

methods,is a direction of a global descent and for more diversiﬁcation combined with Tabu search.This

algorithm is called hybrid GOP (HGOP).Cluster analysis is one of the attractive data mining techniques

that are used in many ﬁelds.One popular class of data clustering algorithms is the center based clustering

algorithm.K-means is used as a popular clustering method due to its simplicity and high speed in clus-

tering large datasets.However,K-means has two shortcomings:dependency on the initial state and con-

vergence to local optima and global solutions of large problems cannot found with reasonable amount of

computation effort.In order to overcome local optima problemlots of studies have been done in cluster-

ing.In this paper,we proposed application of hybrid global optimization algorithmbased on a dynamical

systems approach.We compared HGOP with other algorithms in clustering,such as GAK,SA,TS,and ACO,

by implementing them on several simulation and real datasets.Our ﬁnding shows that the proposed

algorithm works better than others.

2010 Elsevier Ltd.All rights reserved.

1.Introduction

Clustering,so-called set partitioning,is a basic and widely ap-

plied methodology.Application ﬁelds include statistics,mathemat-

ical programming (such as location selecting,network partitioning,

routing,scheduling and assignment problems,etc.) and computer

science (including pattern recognition,learning theory,image pro-

cessing and computer graphics,etc.).Clustering is mainly to group

all objects into several mutually exclusive clusters in order to

achieve the maximumor minimumof an objective function.Clus-

tering is rapidly becoming computationally intractable as problem

scale increases,because of the combinatorial character of the

method.Brucker (1978) and Ward (1963) proved that,for speciﬁc

object functions,clustering becomes an NP-hard problem when

the number of clusters exceeds 3.

There are many methods applied in clustering analysis,like

hierarchical clustering,partition-based clustering,density-based

clustering,and artiﬁcial intelligence-based clustering.

One popular class of data clustering algorithms is the center

based clustering algorithm.K-means is used as a popular clustering

method due to its simplicity and high speed in clustering large

datasets (Forgy,1965).However,K-means has two shortcomings:

dependency on the initial state and convergence to local optima

(Selim & Ismail,1984) and also global solutions of large problems

cannot found with reasonable amount of computation effort

(Spath,1989).In order to overcome local optima problem lots of

studies have been done in clustering.

Mualik and Bandyopadhyay (2000) proposed a genetic algo-

rithm based method to solve the clustering problem and experi-

ment on synthetic and real life datasets to evaluate the

performance.The results showed that GA-based method might im-

prove the ﬁnal output of K-means.

Krishna and Murty (1999) proposed a novel approach called ge-

netic K-means algorithm for clustering analysis.It deﬁnes a basic

mutation operator speciﬁc to clustering called distance-based

mutation.Using ﬁnite Markov chain theory,it proved that GKA

converge to the best-known optimum.

Selim and Al-Sultan (1991) discussed the solution of the clus-

tering problem usually solved by the K-means algorithm.The is

problem known to have local minimum solutions,which are

0957-4174/$ - see front matter 2010 Elsevier Ltd.All rights reserved.

doi:10.1016/j.eswa.2010.02.047

* Corresponding author.

E-mail addresses:Ali.Maroosi@gmail.com,Ali_maroosi@ee.iust.ac.ir (A.Maroo-

si),Amiri_babak@ind.iust.ac.ir (B.Amiri).

Expert Systems with Applications 37 (2010) 5645–5652

Contents lists available at ScienceDirect

Expert Systems with Applications

j ournal homepage:www.el sevi er.com/l ocat e/eswa

usually what the K-means algorithm obtains.The simulated

annealing approach for solving optimization problems described

and proposed for solving the clustering problem.The parameters

of the algorithm were discussed in detail and it was shown that

the algorithm converges to a global solution of the clustering

problem.

According to Sung and Jin (2000),researchers considered a clus-

tering problem where a given data set partitioned into a certain

number of natural and homogeneous subsets such that each subset

is composed of elements similar to one another but different from

those of any other subset.For the clustering problem,a heuristic

algorithm exploited by combining the Tabu search heuristic with

two complementary functional procedures,called packing and

releasing procedures.The algorithm was numerically tested for

its electiveness in comparison with reference works including the

Tabu search algorithm,the K-means algorithm and the simulated

annealing algorithm.

Over the last decade,modeling the behavior of social insects,

such as ants and bees,for the purpose of search and problemsolv-

ing has been the context of the emerging area of swarm intelli-

gence.Using ant colony is a typical successful swarm-based

optimization approach,where the search algorithm is inspired by

the behavior of real ants.

Kuo,Wang,Hu,and Chou (2005) proposed a novel clustering

method,ant K-means (AK) algorithm.Ant K-means algorithmmod-

iﬁes the K-means as locating the objects in a cluster with the prob-

ability,which updated by the pheromone,while the rule of

updating pheromone is according to total within-cluster variance

(TWCV).

Shelokar,Jayaraman,and Kulkarni (2004) presents an ant col-

ony optimization,methodology for optimally clustering N objects

into K clusters.The algorithmemploys distributed agents who mi-

mic the way real ants ﬁnd a shortest path from their nest to food

source and back.They compared result with other algorithms in

clustering,GA,Tabu search,SA.They showed that their algorithms

are better than other algorithms in performance and time.

This paper presents applicationof HGOPalgorithmfor clustering.

The paper is organized as follows:in Section 2 we discussed cluster

analysis problems.Section3introduces HGOPphilosophyandappli-

cationof it onclustering,andtheninSection4experimental result of

proposed clustering algorithmin comparison with other clustering

algorithms is shown.

2.Clustering

Data clustering,which is an NP-complete problem of ﬁnding

groups in heterogeneous data by minimizing some measure of

dissimilarity,is one of the fundamental tools in data mining,

machine learning and pattern classiﬁcation solutions (Garey,

Johnson,& Witsenhausen,1982).Clustering in N-dimensional

Euclidean space RN is the process of partitioning a given set of

n points into a number,say k,of groups (or,clusters) based on

some similarity (distance) metric in clustering procedure which

is Euclidean distance,derived from the Minkowski metric (Eqs.

(1) and (2)).

dðx;yÞ ¼

X

m

i¼1

jx

i

y

j

j

r

!

1=r

ð1Þ

dðx;yÞ ¼

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

X

m

i¼1

ðx

i

y

j

Þ

v

u

u

t

2

ð2Þ

Let the set of n points fX

1

;X

2

;...;X

n

g be represented by the set S

and the k clusters be represented by C

1

;C

2

...;C

K

.Then:

C

i

–/for i ¼ 1;...;k;

C

i

\C

j

¼/for i ¼ 1;...;k;j ¼ 1;...;k;and i –j

and [

K

i¼1

C

i

¼ S

In this study,we will also use Euclidian metric as a distance metric.

The existing clustering algorithms can be simply classiﬁed into the

following two categories:hierarchical clustering and partitional

clustering.The most popular class of partitional clustering methods

are the center based clustering algorithms (Gungor & Unler,2006).

The K-means algorithms,is one of the most widely used center

based clustering algorithms (Forgy,1965).To ﬁnd K centers,the

problem is deﬁned as an optimization (minimization) of a perfor-

mance function,f ðX;ZÞ,deﬁned on both the data items and the cen-

ter locations.A popular performance function for measuring

goodness of the k clustering is the total within-cluster variance or

the total mean-square quantization error (MSE),Eq.(3) (Gungor &

Unler,2006).

f ðX;ZÞ ¼

X

N

i¼1

MinfkX

i

Z

l

k

2

j l ¼ 1;...;K:g ð3Þ

The steps of the K-means algorithmare as follow(Mualik & Bandyo-

padhyay,2000):

Step 1:Choose K cluster centers Z

1

;Z

2

;...;Z

k

randomly fromn

points fX

1

;X

2

;...;X

n

g.

Step 2:Assign point X

i

;i ¼ 1;2;...;n to cluster

C

j

;J 2 f1;2;...;Kg if kX

i

Z

j

k < kX

i

Z

p

k,p=1,2,...,K,and

j –p.

Step 3:Compute new cluster centers Z

1

;Z

2

;...;Z

K

as follows:

Z

i

¼

1

n

X

x

j

2C

i

X

j

;i ¼ 1;2;...;K;

where n

i

is the number of elements belonging to cluster C

i

.

Step 4:If termination criteria are satisﬁed,stop otherwise con-

tinues from step 2

Note that in case the process does not terminate at step 4 nor-

mally,then it is executed for a mutation ﬁxed number of iterations.

Global optimization algorithm (GOP)

Steps of the global optimization algorithm are as follow:

1.Select some initial random points that is deﬁne a set

A ¼ fUðtÞ ¼ ðU1ðtÞ;...;UnðtÞÞ;t ¼ 1;...;Tg;the set A uniformly

select from the box U 2 R

Kd

;a

i

6 u

i

6 b

i

;i ¼ 1;...;Kd;.

2.Calculate the performance value at set A and then chose a point

U

2 Awhich provides the best performance value and U

T

¼ U

.

3.Find good point U

Tþ1

from U

ðU

T

Þ and add it to the set A.For

each point U 2 A and each coordinate i calculate degree of the

change of objective function values f(U) when u

i

changes.Here

change means either decrease or increase of a scalar variable.

4.U

T

¼ U

that U

¼ ðu

0

;u

1

;...;u

n

Þ,calculate Fðu

i

"Þ Fðu

i

#Þ.

5.Using these degrees,calculate forces FðTÞ ¼ ðF

1

ðTÞ;...;F

KL

ðTÞÞ

acting on increase of objective function values f at the point

U

ðU

T

Þ.FðTÞ ¼ Fðu

i

"Þ Fðu

i

#Þ.

6.Calculate U

Tþ1

from U

T

by U

Tþ1

¼ U

T

þ

a

FðTÞ.

7.Then by the same manner,choose a new point U

Tþ2

and so on.

8.This process is terminated if either FðTÞ ¼ 0 or after maximum

iteration hold the best solution and repeat all of the these pro-

cedures with take other initial randompoints and another stage

again.

5646 A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652

3.Application of GOP in clustering

3.1.Philosophy behind the development of the GOP

There are many different methods and algorithms developed for

global optimization problems (Migdalas,Pardalos,& Varbrand,

2001).The algorithms GOP takes into account some relatively

”worse” points for further consideration.This is what many other

methods do,such as Simulated,Annealing (Glover & Laguna,

1997),Genetic Algorithms (Smith,2002) and Taboo Search

(Cvijovic & Klinovski,2002).The choice of a decent (good) direction

is the main part of each algorithm.Instead of using a stochastic

search (as in the algorithms mentioned),GOP uses the formula

(Mammadov,2004;Mammadov,Rubinov,& Yearwood,2005).

Note that the GOP algorithm has quite different settings and

motivations compared to the methods that use so-called dynami-

cal search (Pronzato,Wynn,& Zhigljausky,2002).This method

for a search has some ideas in common with the heuristic method

which attempts to estimate the overall convexity characteristics of

the objective function.The advantage of this approach is that it

does not use any approximate underestimations including convex

underestimations.

Fig.1.Flowchart of the HGOP clustering algorithm.

A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652 5647

3.2.Application of hybrid GOP algorithm on clustering

The search capability of HGOP algorithmused in this article for

the purpose of appropriately determining a ﬁxed number of K clus-

ter centers in R

N

;thereby suitably clustering the set of n unlabelled

points the clustering metric that has been adopted is the sum of

the Euclidean distance of the points from their respective cluster

centers.The steps of the proposed algorithm are shown in Fig.1

and described in detail in this section.The pseudo code for HGOP

clustering algorithm is shown in Fig.2.Steps of application of

the HGOP on clustering are as follows:

Step 0.f ðU

Total

Best

Þ þ1

Num

Main

Iter ¼ 0;

TabuListðTLÞ ¼/;

Step 1.Select some initial random points that is deﬁne a set

A ¼ fUðtÞ ¼ ðU

1

ðtÞ;...;U

n

ðtÞÞ;t ¼ 1;...;Tg;the set A is

uniformly selected from the box U 2 R

Kd

;a

i

6 u

i

6 b

i

;

i ¼ 1;...;Kd;and does not belong to the Tabu Region

(Visited Region),where a

i

;b

i

are minimum and maxi-

mumbound of u

i

that set with minimumand maximum

of data that should be clustered.K is the number of clus-

ters and d is the number of dimensions of points and

Tabu region (TR) determined by Tabu List.

Step 2.If TL ¼ fV

1

;V

2

;...;V

M

then TR ¼ fUjjU V

i

j <

q

i

;i 2

f1;2;...;Mg such that

q

i

is radius of spherical area with

center V

i

that is percent of

C

ð

C

¼ maxðb

i

a

i

Þ that

i ¼ 1;...;KdÞ.The i th point is represented as a vector of

decision variable values UðiÞ ¼ ðu

1

i

;u

2

i

;...;u

Kd

i

Þ that is a

candidate solution with K cluster centers.For example

Uð1Þ ¼ ð2;5;1;6;3;2;5;7;4Þ represents a solution with

3 cluster centers that are Z

1

¼ ð2;5;1Þ;Z

2

¼ ð6;3;

2Þ;Z

3

¼ ð5;7;4Þ,such that each center has three

dimensions.

Step 3.Compute the performance value of f ðiÞ for each point

UðiÞ.Note that to prevent the difference between

U

1

¼ ðZ

1

;Z

2

;Z

3

Þ and U

1

¼ ðZ

2

;Z

1

;Z

3

Þ after selecting

initial randompoints swap Z

i

to overcome this problem.

Step 4.Calculate the performance value at set A and then choose

U

as U

2 A,which it provides the best performance

value and then put U

in U

T

.

Step 5.Find good point U

Tþ1

fromU

ðU

T

Þ and add it to the set A

and update Tabu List as follow:

Step 5-1.For each point U 2 A and each coordinate i calculate

degree of the change of f(U) when u

i

changes.Here change

means either decrease or increase of a scalar variable.If

U

T

¼ U

¼ ðu

0

;u

1

;...;u

n

Þ points that force FðTÞ was calcu-

lated for this point and U ¼ ðu

0

;u

1

;...;u

n

Þ was represented,

the points exist.This method describes dynamical systems,

based on the nonfunctional relationship between two vari-

ables.It is based on the notion of a fuzzy derivative.The

fuzzy derivative @u

i

=@u

j

is deﬁned as an inﬂuence of the fea-

ture j on the feature i that

@u

i

@u

j

¼

l

ðu

i

;u

j

Þ;i;j 2 1;...;n

where

l

ðu

i

;u

j

Þ ¼ ð

l

1

ðu

i

;u

j

Þ;

l

2

ðu

i

;u

j

Þ;

l

3

ðu

i

;u

j

Þ;

l

4

ðu

i

;u

j

ÞÞ;i;j

2 f0;...;ng and i –j

and

l

1

ðu

i

;u

j

Þ ¼dðu

i

";u

j

"Þ;

l

2

ðu

i

;u

j

Þ ¼dðu

i

";u

j

#Þ;

l

3

ðu

i

;u

j

Þ ¼

dðu

i

#;u

j

#Þ;

l

4

ðu

i

;u

j

Þ ¼ dðu

i

#;u

j

"Þ

l

1

ðu

i

;u

j

Þ shows the degree

of the increase of the entry i if the entry j increases starting

from the initial state ðu

i

;u

j

Þ and so as for

l

2

ðu

i

;u

j

Þ;

l

3

ðu

i

;u

j

Þ

and

l

4

ðu

i

;u

j

Þ

l

1

¼

M

11

M

1

;

l

1

¼

M

12

M

1

;

l

1

¼

M

13

M

2

;

l

1

¼

M

14

M

2

where M

1

is the number of points ðu

i

;u

j

Þ that satisfy u

i

> u

i

.

M

11

is the number of points ðu

i

;u

j

Þ that satisfy u

i

> u

i

and

u

j

> u

j

.M

12

is the number of points ðu

i

;u

j

Þ that satisfy

u

i

> u

i

and u

j

< u

j

.M

2

is the number of points ðu

i

;u

j

Þ that

satisfying u

i

< u

i

.M

13

is the number of points ðu

i

;u

j

Þ that sat-

isfy u

i

< u

i

and u

j

< u

j

.M

14

is the number of points ðu

i

;u

j

Þ

that satisfy u

i

< u

i

and u

j

> u

j

.

l

ðu

i

;u

j

Þ ¼ ð

g

1

;

g

2

;

g

3

;

g

4

Þ and

l

ðu

j

;u

i

Þ ¼ ð

1

1

;

1

2

;

1

3

;

1

4

Þ then Fðu

j

!u

i

"Þ ¼

g

1

1

1

þ

g

4

1

2

and

Fðu

j

!u

i

#Þ ¼

g

3

1

3

þ

g

2

1

4

.The resulting forces on the entry

i is deﬁned as a sum of all these forces that is,

Fðu

i

"Þ ¼

P

i – j

Fðu

j

!u

i

"Þ;Fðu

i

#Þ ¼

P

i – j

Fðu

j

!u

i

#Þ

Step 5-2.Using these degrees,calculate forces FðTÞ ¼

ðF

1

ðTÞ;...;F

KL

ðTÞÞ acting to increase amount of f at the point

of U

ðU

T

Þ.FðTÞ ¼ Fðu

i

"Þ Fðu

i

#Þ

Step 5-3.Calculate

b

U

Tþ1

by using UT.

set

a

¼ 1 and U

Dynamical

System

¼ U

T

while

a

>

e

and Iteration<Max_Num_Iter_Dynamical_

System

If f ðU

Dynamical

System

þ

a

FðTÞÞ < f ðU

Dynamical

System

Þ

Fig.2.Pseudo code for HGOP clustering algorithm.

5648 A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652

U

Dynamical

System

¼ U

Dynamical

System

þ

a

FðTÞ

else

a

¼

a

=2

Iteration= Iteration+1;

b

U

Tþ1

¼ U

Dynamical

System

In above

e

is a small positive number.

Step 5-4.Reﬁnement of a new point:Using a local optimiza-

tion procedure starting in

b

U

Tþ1

to ﬁnd a local minimumU

Tþ1

.A direct search method (Pattern Search,(Hart,2001)) is

used at this stage.

Step 5-4-1.Set u

0

¼

b

U

Tþ1

;

D

p

¼

D

0

.

Step 5-4-2.for p ¼ 0;1;...,number of pattern iteration,

we have an iterate u

p

and a step-length parameter

D

p

> 0 Let ei;i ¼ 1;...;n,denote the standard unit basis

vectors.

Step 5-4-3.Look at the points u

pattern

¼ u

p

D

p

e

i

;i ¼

1;...;n to ﬁnd u

pattern

for which f ðu

pattern

Þ < f ðu

p

Þ.If you

ﬁnd no u

pattern

such that f ðu

pattern

Þ < f ðu

p

Þ,then reduce

D

p

by a half and continue;otherwise,leave the step-

length parameter alone,setting

D

pþ1

¼

D

p

and u

pþ1

¼

u

pattern

.In the latter case we can also increase the step-

length parameter,say,by a factor of 2,if we feel a longer

step might be justiﬁed.

Step 5-4-4.Repeat the iteration just described until

D

p

is

deemed sufﬁciently small.

Step 5-4-5.At the end U

Tþ1

¼ u

numberofpatterniteration

Step 5-5.Construct a newpopulation Aðt þ1Þ ¼ AðtÞ[fU

Tþ1

g.

Step 5-6.Update Tabu List with U

Tþ1

,if Tabu List is full,

delete the point with worse ranking.Tabu Lists are ranked

and saved according to their objective function values and

Sum of the Euclidian distance of each point from another

point in the list.

5-6-1.Save ranking of Tabu List according to the objec-

tive function values and name it Objective Function Rank

(OFR).

5-6-2.Save ranking of Tabu List according to the maxi-

mumsumof Euclidian distance values fromthe tabu list

points and name it Distance Rank (DR).

5-6-3.Ranking of each point in TL is best ranking in DR

and OFR.

Step 6.If either FðTÞ <

c

or T > T

*

;go to step 7,otherwise,return

to step 4 where

c

> 0 and T

*

is a positive integer.

Step 7.If f ðU

Total

Best

Þ > f ðU

Tþ1

Þ then U

Total

Best

¼ U

Tþ1

else

U

Total

Best

¼ U

Total

Best

.If Num_Main_Iter is less than Max_-

Num_Main_Iter then Num_Main_Iter = Num_Main_Iter

+1;and return to step 1.

Step 8.The U

Total

Best

is ﬁnal solution and Cluster center.

4.Experimental result

The experimental results comparing the HGOP clustering algo-

rithmwith several typical algorithms including the ACO algorithm

(Shelokar et al.,2004),the simulated annealing approach (Selim&

Al-Sultan,1991),the genetic K-means algorithm(Krishna & Murty,

1999),and the Tabu search approach (Sung & Jin,2000) are pro-

vided for four artiﬁcial data sets (Data 1,Data 2,Data 3 and Data

4) and ﬁve real-life data sets (Vowel,Iris,Crude Oil,Wine and Thy-

roid diseases data),respectively.These are ﬁrst described below.

The effectiveness of algorithms is greatly dependent on the gener-

ation of initial solutions.Therefore,for every dataset,algorithms

performed 10 times individually for their own effectiveness tests,

each time with randomly generated initial solutions.We have done

our experiments on a PentiumIV,2.8 GHz,512 GB RAMcomputer

and we have coded with Matlab 7.1 software.We run all ﬁve algo-

rithms on datasets.

4.1.Artiﬁcial data sets

Data 1:This is a nonoverlapping two-dimensional data set

where the number of clusters is two.It has 10 points.The value

of K is chosen to be 2 for this data set.

Data 2:This is a nonoverlapping two-dimensional data set

where the number of clusters is three.It has 76 points.The value

of K is chosen to be 3 for this data set.

Data 3:This is an overlapping two-dimensional triangular dis-

tribution of data points having nine classes where all the classes

are assumed to have equal a priori probabilities ( = 1/19 ).It has

900 data points.The X-Y ranges for the nine classes are as follows:

Class 1:[3.3,0.7] [0.7,3.3],

Class 2:[1.3,1.3] [0.7,3.3],

Class 3:[0.7,3.3] [0.7,3.3],

Class 4:[3.3,0.7] [1.3,1.3],

Class 5:[1.3,1.3] [1.3,1.3],

Class 6:[0.7,3.3] [1.3,1.3],

Class 7:[3.3,0.7] [3.3,0.7],

Class 8:[1.3,1.3] [3.3,0.7],

Class 9:

[0.7,3.3] [3.3,0.7].

Thus the domain for the triangular distribution for each class

and for each axis is 2.6.Consequently,the height will be 1/1.3

(since 12

*

2.6

*

height”1).The value of K is chosen to be 9 for this

data set.

Data 4:This is an overlapping ten-dimensional data set gener-

ated using a triangular distribution of the formshown in Fig.3 for

two classes,1 and 2.It has 1000 data points.The value of Kis chosen

to be 2 for this data set.The range for class 1 is [0,2] [0,2] [0,

2]...10 times,and that for class 2 is [1,3] [0,2] [0,2]...9 times,

with the corresponding peaks at (1,1) and (2,1).The distribution

along the ﬁrst axis (X) for class 1 may be formally quantiﬁed as:

f

1

ðxÞ ¼

0 for x 6 0

x for 0 < x 6 1

2 x for 1 < x 6 2

0 for x > 2

8

>

>

>

<

>

>

>

:

for class 1.Similarly for class 2

Fig.3.Triangular distribution along the X-axis.

A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652 5649

f

1

ðxÞ ¼

0 for x 6 1

x 1 for 1 < x 6 2

3 x for 2 < x 6 3

0 for x > 3

8

>

>

>

<

>

>

>

:

The distribution along the other nine axes ðY

i

;i ¼ 1;2;...;9Þ for

both the classes is

f

1

ðxÞ ¼

0 for y

i

6 0

y

i

for 0 < y

i

6 1

2 y

i

for 1 < y

i

6 2

0 fory

i

> 2

8

>

>

>

<

>

>

>

:

4.2.Real-life data sets

Vowel data:This data consists of 871 Indian Telugu vowel

sounds (Pal & Majumder,1977).These were uttered in a

consonant–vowel–consonant context by three male speakers in

the age group of 30–35 years.The data set has three features

F1,F2 and F3,corresponding to the ﬁrst,second and third vowel

formant frequencies,and six overlapping classes fd;a;i;u;e;og.

The value of K is therefore chosen to be 6 for this data.

Iris data:This is the Iris data set,which is perhaps the best-

known database to found in the pattern recognition literature.

Fisher’s paper is a classic in the ﬁeld and referenced frequently

to this day.The data set contains three classes of 50 instances each,

where each class refers to a type of iris plant.One class is linearly

separable from the other 2;the latter are not linearly separable

from each other.There are 150 instances with four numeric attri-

butes in iris data set.There is no missing attribute value.The attri-

butes of the iris data set are;sepal length in cm,sepal width in cm,

petal length in cm and petal width in cm (Blake and Merz).

Crude oil data:This overlapping data (Johnson & Wichern,1982)

has 56 data points,5 features and 3 classes.Hence the value of K is

chosen to be 3 for this data set.

Wine data:This is the wine data set,which is also taken from

MCI laboratory.These data are the results of a chemical analysis

of wines grown in the same region in Italy but derived fromthree

different cultivars.The analysis determined the quantities of 13

constituents found in each of the three types of wines.There are

178 instances with 13 numeric attributes in wine data set.All attri-

butes are continuous.There is no missing attribute value.

Thyroid diseases data:This dataset categories N = 215 samples of

patients suffering from three human thyroid diseases,K = 3 as:

euthyroid,hyperthyroidism,and hypothyroidism patients where

150 individuals tested euthyroid thyroid,30 patients experienced

hyperthyroidismthyroid while 35 patients suffered fromhypothy-

roidismthyroid.Each individual was characterized by the result of

ﬁve,n= 5 laboratory tests as:total serumthyroxin,total serumtri-

iodothyronine,serum tri-iodothyronine resin uptake,serum thy-

roid-stimulating hormone (TSH),and increase TSH after injection

of TSH-releasing hormone (Blake and Merz).

The comparison of results for each dataset is based on the bet

solution found in 10 distinct runs of each algorithm and the con-

vergence processing time taken to attain the best solution.The

solution quality is also given in terms of the average and worst

Table 5

Result obtained by the ﬁve algorithms for 10 different runs on Vowel data.

Method Function value CPU time (s)

F

best

F

average

F

worst

HGOP 148718.363754 148718.454321 148718.674567 69.85

ACO 148837.736634 148837,768828 148837.937878 73.65

GAK 149346.152274 149391,501798 149436.851323 98.72

TS 150635.653256 1506480,795320 150697.784636 81.25

SA 149357.634587 1494360,175420 149749.549362 79.46

Table 1

Result obtained by the ﬁve algorithms for 10 different runs on dataset 1.

Method Function value CPU time (s)

F

best

F

a

v

erage

F

worst

HGOP 3,120125 3,131337 3,228137 1.81

ACO 3,142375 3,163422 3,352843 1.89

GAK 3,273426 3,355521 3,683901 2.01

TS 3,244326 3,310024 3,572814 1.92

SA 3,217832 3,282089 3,539115 1.99

Table 2

Result obtained by the ﬁve algorithms for 10 different runs on dataset 2.

Method Function value CPU time (s)

F

best

F

a

v

erage

F

worst

HGOP 51.493674 51,533427 51.687453 8.23

ACO 52.082746 52,212071 52.729373 8.98

GAK 56,142562 56,377520 57,317354 17.24

TS 54.752946 54,879342 55.384927 14.57

SA 53.562492 53,635943 53.929748 14.82

Table 3

Result obtained by the ﬁve algorithms for 10 different runs on dataset 3.

Method Function value CPU time (s)

F

best

F

a

v

erage

F

worst

HGOP 962.342786 962.578234 964.753761 25.93

ACO 964.739472 965,048327 966.283745 26.88

GAK 966,649837 966,772302 966,853946 38.52

TS 972.629478 973,209275 975.528463 32.78

SA 966.418263 966,614089 967.397392 31.24

Table 4

Result obtained by the ﬁve algorithms for 10 different runs on dataset 4.

Method Function value CPU time (s)

F

best

F

a

v

erage

F

worst

HGOP 1246.135426 1246,325342 1246.374356 120.63

ACO 1248.958685 1249,034036 1249.335442 122.34

GAK 1258.673362 1520,777767 1271.635528 178.42

TS 1282.538294 1285,988483 1299.789237 142.15

SA 1249.736287 1249,968105 1250.895375 136.61

Table 6

Result obtained by the ﬁve algorithms for 10 different runs on Iris data.

Method Function value CPU time (s)

F

best

F

average

F

worst

HGOP 96.370352 96.373654 96.387564 31.35

ACO 97.100777 97.171546 97.808466 33.72

GAK 113.986503 125.197025 139.778272 105.53

TS 97.365977 97.868008 98.569485 72.86

SA 97.100777 97.134625 97.263845 95.92

5650 A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652

values of the clustering metric (F

avg

;F

worst

,respectively) after 10

different runs for each of the ﬁve algorithms.F is the performance

of clustering method that illustrated in Eq.(3).Tables 1–9 show

these results.

For Data 1 (Table 1) it is found that the HGOP clustering algo-

rithm provides the optimal vale of 3,120125 in 90% of the total

runs that is better than other clustering algorithms.The ACO clus-

tering algorithm found value of 3,142375 in 90% of runs and GAK,

TS,and SA found values of 3,273426,3,244326,and 3,217832 in

80% of runs.The HGOP required the least processing time (1.81).

For Data 2 (Table 2) the HGOP clustering algorithmattains the best

value of 51.493674 in 90% of the total runs.On the other hand ACO,

GAK,TS and SA algorithms attain 52,082746,56,142562,

54,752946,and 53,562492 in 80% of the total runs.The execution

time taken by the HGOP algorithm is less than other algorithms

(8.23).Similarly for Data 3 (Table 3) and Data 4 (Table 4) the best

HGOP clustering algorithm attains the best values of 962.342786

and 1246.135426 in 90% and all of total runs,respectively.The best

value provided by ACO,TS and SA obtained in 80% of total runs and

the best value provided by GAK obtained in 40% of runs.In terms of

the processing time,the HGOP performed better than other clus-

tering algorithms as can be observed from Tables 3 and 4.

For Vowel Data,(Table 5) the HGOP clustering algorithmattains

the best value of 148718.363754 in 90% of runs.ACO,TS,and SA

provided the best values in 80% of runs and the GAK algorithmat-

tains the best value only in 50% of total runs.In addition the HGOP

clustering algorithm performed better than other algorithms in

terms of the processing time required (69.85).For clustering prob-

lem,on iris dataset results given in Table 6,show that the HGOP

provide the optimumvalue of 96.370352.The HGOP and ACOwere

able to ﬁnd the optimum nine times as compared to that of ﬁve

times obtained by SA.The HGOP required the least processing time

(31.35).

For Crude Oil data set,the HGOP clustering algorithm attains

the best value of 251.534997 in 90% of total runs and ACO,GAK,

TS,and SA attain the best value of 253.564637,278,965152,

254.645375,and 250.983245 in 80% of total runs.The processing

time required by HGOP is less than other algorithms (14.43).

The result obtained for the clustering problem,Wine dataset gi-

ven in Table 8.The HGOP ﬁnd the optimum solution of

16228.645326 and the ACO,SA and GAK methods provide

16530.533807.The HGOP,ACO,SA and GAK methods found the

optimum solution in all their 10 runs.The execution time taken

by the HGOP algorithm is less than other algorithms.

Table 7

Result obtained by the ﬁve algorithms for 10 different runs on Crude oil data.

Method Function value CPU time (s)

F

best

F

average

F

worst

HGOP 250.983245 251.243564 252.028164 14.43

ACO 253.564637 254,180897 256.645938 14.98

GAK 278,965152 279,907028 283.674535 35.26

TS 254.645375 2554229528 258.533264 26.55

SA 253.763548 254,653207 258.211847 24.74

Table 8

Result obtained by the ﬁve algorithms for 10 different runs on Wine data.

Method Function value CPU time (s)

F

best

F

average

F

worst

HGOP 16228.645326 16228.645326 16228.645326 56.37

ACO 16530.533807 16530.533807 16530.533807 68.29

GAK 16530.533807 16530.533807 16530.533807 226.68

TS 16666.226987 16785.459275 16837.535670 161.45

SA 16530.533807 16530.533807 16530.533807 57.28

Table 9

Result obtained by the ﬁve algorithms for 10 different runs on Thyroid data.

Method Function value CPU time (s)

F

best

F

average

F

worst

HGOP 10109.874563 10111.132455 10113.657348 94.34

ACO 10111.827759 10112.126903 10114.819200 102.15

GAK 10116.294861 10128.823145 10148.389608 153.24

TS 10249.72917 10354.315021 10438.780449 114.01

SA 10111.827759 10114.045265 10115.934358 108.22

Table 10

Values of parameters of each of ﬁve algorithms.

HGOP ACO GAK TS SA

Parameter Value Parameter Value Parameter Value Parameter Value Parameter Value

Number of max Num_Main_Iter 15 Ants (R) 50 Population size 50 Tabu list size 25 Probability

threshold

0.98

Positive small integer

c

.01 Probability threshold

for maximumtrail (q0)

0.98 Crossover rate 0.8 Number of trial

solutions

40 Initial temperature 5

Positive small integer

e

.0005 Local search

probability (pls)

0.01 Mutation rate 0.001 Probability

threshold

0.98 Temperature

multiplier

0.98

Number of iteration T

*

40 Evaporation rate ð

q

Þ 0.01 Maximum

number of

iterations

1000 Maximum

number of

iterations

1000 Final temperature 0.01

step-lengthof pattern search

D

0

.5 Maximum number of

iterations (itermax)

1000 Number of iterations

detect steady state

100

Number of initial point 20 Maximum number

of iterations

30,000

number of pattern iteration 20

Max_Num_Iter_Dynamical_System 40

radius of tabu region

q

i

.2

C

Tabu List 20

A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652 5651

The HGOP algorithm for the human thyroid disease dataset,

provides the optimum solution of 10109.874563 to this problem

with success rate of 90% during 10 runs.In termof the processing

time the HGOP performed better than other clustering algorithms

as can be observed from Table 9.

Shelokar et al.(2004) performed several simulations to ﬁnd the

algorithmic parameters that result into the best performance of

ACO,GAK,SA and TS algorithms in terms of the equality of solution

found,the function evaluations and the processing time required.

In this study,we used their algorithmic parameters.In addition,

we performed several simulations to ﬁnd the algorithmic parame-

ters for HGOP algorithm.Algorithmic parameters for all algorithms

are illustrated in Table 10.

The result illustrate that the proposed HGOP optimization ap-

proach can be considered as a viable and an efﬁcient heuristic to

ﬁnd optimal or near optimal solutions to clustering problems of

allocating N objects to k clusters.As mentioned later,ﬁnal solu-

tion in K-means algorithm is sensitive to the initial population.

In proposed HGOP algorithm initial solutions and individual solu-

tions are not important and exchange of information among dif-

ferent individual solutions causes the proposed algorithm to

ﬁnd the global solution and actually over come the K-means

shortcoming.

5.Conclusion

In summary,in this paper the hybrid HGOP algorithmis used to

solve clustering problems.The HGOP algorithms use the notion of

relationship between variables that describes inﬂuences of the

changes of the variables to each other.The HGOP algorithm takes

into account some relatively worse points for further consider-

ation.This is what other methods do,such as Simulated Annealing,

Genetic Algorithms and Taboo Search.The HGOP algorithm at-

tempts to jump over local minimumpoints and tries to ﬁnd deeper

points.In this paper the global optimum is combined with the

Tabu search to solve the problem of revisiting the visited region.

This hybridization makes the algorithm to be faster.The HGOP

algorithm for data clustering can be applied when the number of

clusters are known a priori and are crisp in nature.To evaluate

the performance of the HGOP algorithm,it is compared with other

stochastic algorithms viz.ant colony,genetic algorithm,simulated

annealing and Tabu search.The algorithm is implemented and

tested on several simulation and real datasets;preliminary compu-

tational experience is very encouraging in terms of the quality of

solution found and the processing time required.

References

Blake,C.L.,& Merz,C.J.UCI repository of machine learning databases.Available

from:<http://www.ics.uci.edu/_mlearn/MLRepository.html>.

Brucker,P.(1978).On the complexity of clustering problems,optimization and

operation research.Lecture Notes in Economics and Mathematical Systems,157,

45–54.

Cvijovic,D.,& Klinovski,J.(2002).Taboo search:An approach to the

multipleminima problem for continuous functions.In P.Pardalos & H.

Romeijn (Eds.).Handbook of Global Optimization (Vol.2).Kluwer Academic

Publishers.

Forgy,E.W.(1965).Cluster analysis of multivariate data:Efﬁciency versus

interpretability of classiﬁcations.Biometrics,21(3),768–769.

Garey,M.R.,Johnson,D.S.,& Witsenhausen,H.S.(1982).The complexity of the

generalized Lloyd–Max problem.IEEE Transactions on Information Theory,28(2),

255–256.

Glover,F.,& Laguna,M.(1997).Taboo search.Kluwer Academic Publishers.

Gungor,Z.,& Unler,A.(2006).K-harmonic means data clustering with simulated

annealing heuristic.Applied Mathematics and Computation.

Hart,W.E.(2001).A convergence analysis of unconstrained and bound constrained

evolutionary pattern search.Evolutionary Computation,9(1),1–23.

Johnson,R.A.,& Wichern,D.W.(1982).Applied multivariate statistical analysis.

Englewood Clifs,NJ:Prentice-Hall.

Krishna,K.,& Murty (1999).Genetic K-means algorithm.IEEE Transaction on

Systems,Man,and Cybernetics – Part B:Cybernetics,29,433–439.

Kuo,R.I.,Wang,H.S.,Hu,Tung-Lai,& Chou,S.H.(2005).Application of ant K-means

on clustering analysis.Computers and Mathematics with Applications,50,

1709–1724.

Mammadov,M.A.(2004).A new global optimization algorithm based on a

dynamical systems approach.In Proceedings of the 6th international conference

on optimization:techniques and applications.Ballarat,Australia.

Mammadov,M.A.,Rubinov,A.M.,& Yearwood,J.(2005).Dynamical systems

described by relational elasticities with applications to global optimization.In

V.Jeyakumar & A.Rubinov (Eds.),Continuous optimisation:Current trends and

applications (pp.365–387).Springer.

Migdalas,A.,Pardalos,P.,& Varbrand,P.(2001).From local to global optimization.

Nonconvex Optimization and Its Applications (Vol.53).Kluwer Academic

Publishers.

Mualik,U.,& Bandyopadhyay,S.(2000).Genetic algorithm-based clustering

technique.Pattern Recognition,33,1455–1465.

Pal,S.K.,& Majumder,D.D.(1977).Fuzzy sets and decision making approaches in

vowel and speaker recognition.IEEE Transactions on Systems,Man,and

Cybernetics,SMC-7,625–629.

Pronzato,L.,Wynn,H.,& Zhigljausky,A.A.(2002).An introduction to dynamical

search.In P.Pardalos & H.Romeijn (Eds.).Handbook of global optimization (Vol.

2).Kluwer Academic Publishers.

Selim,S.Z.,& Al-Sultan,K.(1991).A simulated annealing algorithm for the

clustering problem.Pattern Recognition,24(10),1003–1008.

Selim,S.Z.,& Ismail,M.A.(1984).K-means type algorithms:a generalized

convergence theoremand characterization of local optimality.IEEE Transactions

on Pattern Analysis and Machine Intelligence,6,81–87.

Shelokar,P.S.,Jayaraman,V.K.,& Kulkarni,B.D.(2004).An ant colony approach for

clustering.Analytica Chimica Acta,509,187–195.

Smith,J.(2002).Genetic algorithms.In P.Pardalos & H.Romeijn (Eds.).Hand book of

Global Optimization

(Vol.2).Kluwer Academic Publishers.

Spath,H.

(1989).Clustering Analysis Algorithms.Chichester,UK:Ellis Horwood.

Sung,C.S.,& Jin,H.W.(2000).A tabu-search-based heuristic for clustering.Pattern

Recognition,33,849–858.

Ward,J.W.(1963).Hierarchical grouping to optimize an objective function.Journal

of the American Statistical Association,58,236–244.

5652 A.Maroosi,B.Amiri/Expert Systems with Applications 37 (2010) 5645–5652

## Comments 0

Log in to post a comment