B. K-harmonic means algorithm

tealackingΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

126 εμφανίσεις



Abstract

-

This paper presents an efficient algorithm
called

K
-
harmonic means clustering algorithm with
simulated
annealing, for

reducing the dependence of the
initial values

and
overcoming to
converge to local minimum.

The proposed algorithm works by t
hat
K
-
harmonic means
algorithm solves the problem that clustering result is
sensitive to the initial valve
s and
simulated annealing

makes
the clustering jump out of local optimal solution at each
iteration
patterns
.

The

clustering result is verified by
exp
eriments on analyzing IRIS dataset. The school XunTong
is application software that is convenient to communication
between parents and teachers. This paper applies the new
algorithm to analysis of dataset in School XunTong and
finds the relationship of stu
dents’ achievement and the
communication between parents and teachers. Finally, the
result of classification guides the learning direction of
students in
universities and cultivates

to students.


Keywords

-

K
-
Harmonic means
,

simulated annealing

,

local

min
imum
,

School XunTong


I. INTRODUCTION



Among these commonly used clustering
algorithm, K
-
means algorithm is typical clustering
algorithm and widely used due to its simplicity and high
effectiveness. However, it has some problems on the
dependence of init
ial value and the local convergence of
clustering result. There are several methods to improve
this algorithm: first, apply K
-
means algorithm to cluster
many times and choose the optimum as a final clustering
results; second, research the new algorithms. T
o improve
k
-
means algorithm for the initial value and local
convergence, this paper puts forward a new algorithm
based on the combination of
K harmonic mean algorithm
and
simul
ated annealing (SA) algorithm. “
School
XunTong” is application software to provi
de a service to
students' parents and involves some information about
students. In this paper, the new algorithm is used to
cluster the set of “School XunTong”, and to find potential
relationships of clustering result.



II
. FUNDAMENTAL

OF ALGORITHM


A.
K
-
means algorithm


K
-
means algorithm(KM) is a common
clustering algorithm

based on classification

and the oldest
classical algorithm
[1]
.
In cluster analysis we assume that
we have been given a finite set of points X in the d
-
dimensional space
d
R
,

that is
,
{ |,1,2,,}
d
i i
X x x R i n
   
.

K
-
means algorithm sets date set matrix X into a given
number k of disjoin
ed

subsets
1 2
,,,
k
C C C

.
An
optimal clustering is a partition that minimizes the int
ra
-
cluster distanc
e and maximizes the int
er
-
cluster distance.

In practice,

the most popular similarity measure is
Euclidean distances due to its computational simplicity
.
Euclidean distances
is
defined as

)]
(
...
)
(
)
[(
)
,
(
2
2
2
1
1
jd
id
j
i
j
i
x
x
x
x
x
x
j
i
d








(1)

subject to

d
id
i
i
R
x
x
x
i


)
,..,
,
(
2
1
and
d
jd
j
j
R
x
x
x
j


)
,...,
,
(
2
1

We
remark the

cluster at each iteration.

The
updating of cluster centers is that




i
c
x
i
x
n
c
1






(2)

Where
)
,..,
,
(
2
1
id
i
i
x
x
x
x



The main idea behind the K
-
means algorithm is
the minimization of an objective function usually taken up
as a function of the deviations between all patterns from
their respective cluster centers. The sum of squared
Euclidean distances measure has been a
dopted in most of
studies
as the

objective function.

It

is

as follows,






k
j
n
i
j
i
ij
c
x
d
E
1
1
)
,
(



(3)

K
-
means
algorithm is simple and efficient, and
has good flexibility for large data,

however
,

k
-
means has
its

limitations

such a
s

the clustering is extremely sensitive
to the initial
value
s and it always converges to local
minimum. K
-
harmonic means algorithm solves the
problem that clustering result is sensitive to the initial
valve.


B.
K
-
harmonic
means algorithm


K
-
harmonic means

(KHM)

is
a
center
-
based
algorithm that has been developed to solve the clustering
problem

[
2
-
5
]
.
This algorithm uses harmonic average of
distance from each data point to the cluster center, instead
of the minimum distance in K
-
means algorithm.
The
harmonic

average is defined as

Application Research of Modified K
-
means Clustering Algorithm




Guo
-
li Liu
,
You
-
qian Tan
,

Li
-
mei
Yu
,

Jia Liu, Jin
-
qiao Gao

Department of
Computer Science and
S
oftware
,
Hebei
University of
Technology
,
Tianjin
,
China

(
lgl6699@163.com
)









C
c
p
c
x
d
k
)
,
(
1


(4)



where
X
x

denotes a finite set of points X in
the d
-
dimensional space
d
R

,
C
c

denotes the cluster
centers,
)
,
(
c
x
d
p
denotes distance between two points,

k
denotes the groups of clusters
.

The
iterate

method of cluster center is that:










n
i
n
j
p
j
i
p
ik
n
i
i
k
j
p
j
i
p
ik
k
d
d
x
d
d
c
1
2
1
,
1
2
1
,
]
[
1
]
[
1



(5)

where
p
ik
d

denotes the distance between
i
x
and
j
x
.

The iterate of cluster center
constantly
minimizes the
objective function
, the
objective function

is:





n
i
C
c
p
c
x
d
k
1
)
,
(
1





(6)

Objective function in KHM algorithm also
introduces

conditional probability

of cluster center to data
points and dynamic weights of data points in each iterate
process

[
3
]
.

KHM algorithm improves the weakness that
K
-
means algorithm is
sens
itive to the initial values
.
However K
-
means still
converges to local minimum.
Heuristic algorithm
s as known

ha
ve

very good

optimal

features

[
1
1
]
, in

paper, we use
simulated annealing
algorithm
to solve local minimum problem of K
-
means
algorithm.


C.

Simul
ated Annealing


Simulated annealing(SA)

presented by
Metropolis


Rosenbluth
[
6

7

14
]

and others in1953

is an
iterative method for finding approximate solutions to
intractable combinatorial optimization problems.

Simulate
d

Annealing solution methodology
res
embles the cooling process of molten
metals

through
annealing. The cooling phenomenon is simulated by
controlling a parameter, namely, temperature T
introduced with the concept of the Boltzmann probability
distribution. Metropolis suggested a way to implem
ent
the Boltzmann probability distribution in simulated
thermodynamic systems that can also be used in the
function minimization context. At any instant the current
point and the corresponding function value at that
point
(
1
x
).

Based on Metropolis algorithm, the probability of
the next point (
2
x
)

depends on the difference in the
function
values (
E

) at these two points. There is some
finit
e probability of selecting the point
2
x

even though it
is worse than the point
1
x

and depends on relative
magnitude of
E


and T values. T
he optimal solution is
obtained by simulating slow cooling
,
that is,

by sampling
repeatedly
[
7
]
.The initial temperature
,
cooling rate
,

number
of iterations performed at a particular temperature and the
condition of stopping are the most important parameters

which governs the success

of the Simulated Annealing
procedure.

Simulated Annealing solves
problem
that k
-
means always converges to local minimum. To solve the
shortcomings
of
dependency on the initial state and

the
convergence to local optima of k
-
mean
s, the paper
proposed a new algorithms called K
-
harmonic means
clustering algorithm with simulated annealing.


III
. K
-
HARMONIC MEANS WITH SIMULATED
ANNEALING ALGORITHM


A. Algorithm theory


K
-
harmonic means clustering
algorithm

with
simulated annealing is

the combination of K
-
harmonic
means and simulated annealing,
parameters

in new
algorithm need to be set depending on the features of new
algorithm. The main idea of SAKHM (
K
-
harmonic means
clustering algorithm with simulated annealing
) is that
make the da
ta set clustering result derived form K
-
harmonic means as the initial value of simulated
annealing algorithm, the generation of new value in
simulation of the
iterative

process is obtained by random
disturbance for current
value
. That is, randomly changes
one of several clustering sample

s category, generates a
new clustering division, so that the algorithm may jump
out of the local minimum value,
play

the

global optimal
ability
, finally obtain the global optimal clustering results
which is not affected by
the initial
value
.

The steps of SAKHM algorithm are:

1) Initializ
e

initial temperature
0
t
,
final
temperature
m
t
,
number of inner circulation iterations
op
MaxInnerLo
,
cooling rate

DR.

2)

Appl
y

k
-
harmonic algorithm

to the new presented
algorithm
, each point in set is divided to
the
point's
closest center due to the minimize distance.
Compute the
centroid for each cluster to obtain a new cluster and
the
objective func
tion
)
1
(
J
.
The clustering result is as the
initial solution
w
.

3) Let variable of inner circulation
InnerLoop

be 0,
initialize counting variable of external circul
ation i.

4)
Perform the iteration to generate the improved set of
cluster,
update the cluster center
)
(
i
w
,
compute the new

object function of
the new iteration that obtain a new set
of cluster
)
1
(

i
J
,

if
)
(
)
1
(
i
J
i
J


,

the
cluster centers

are

accepted
,
if
)
(
)
1
(
i
J
i
J


,

we will compute

the relative
magnitude of
J

and T due to
( 1) ( )
exp( )
( )
J i J i
P
st i
 

,
namely,
)
exp(
sT
J
p


, where
)
(
i
t

stands for current
temperature, s stands for constant

let r be random
probability such that
[0,1]
r

,
if
r
p

,

the new
cluster centers

is accepted, else, the previous
cluster
centers

continue to iterate;

5)

if
op
MaxInnerLo
InnerLoop

,
parameter
InnerLoop

plus to 1,
i
plus to 1
,

if or

MaxLoop
i


,
then go to step
4); else, go to step 6 ;

6) if

( )
m
t i t

,

stop the program, else, use formula

( 1) * ( )
t i DR t i
 
, then go to step3).


B.
K
-
Harmonic means
clustering

algorithm
with

simulated annealing base on DK
-
0
t


K
-
harmonic means clustering algorithm with
sim
ulated annealing focus on
a
ppl
ying

Simulate
Annealing solution methodology

to
K
-
means

algorithm

and setting the key parameters of the new algorithm, the
following four aspects tell how to set parameters
.

It is
important to note that such strategies may sig
nificantly
impact the

performance of the
new presented

algorithm.
3)


4) are the key of the paper.

1) The choice of the objective function

The sum of squared Euclidean distances
measures has been adopted in the algorithm.

2) Update way of temperature

This algorithm uses cooling rate presented by
Kirkpatrick and others

to c
ontrol decrease temperature
.
Let

DR
be

cooling rate,

where DR closes to

constant

1
.
The

formula of updating temperature defines as
T(k+1)= DR*T(k), k is the
updating

number of
temperature.

T
he
cooling

speed of
temperature

is
controlled

by

the parameter

DR

.
This paper sets
DR=0.98.

3)
Generating
the

initial temperature

In the simulated annealing algorithm research,
the selection principle of initial temperature is: at the
beginning of the annealing, temperature has to be high
enough to move to any state. But
if temperature is too
high, it will all make the difference result as new result
for a while, and influence the algorithm’s effect. So
through repeatedly experimenting, initial temperature is
determined by new value‘s proper proportion.

Although scholars
have proposed many initial
temperature setting methods, there is no unified and more
effectively method to
s
et the initial temperature. In
simulated annealing algorithm the set of initial
temperature
0
T

corresponds to the set of in
itial values of
control parameters in
SAKHM algorithm. On the basis of
existing researches, this paper put forward a method to
select initial value
0
t

of control parameter. The concrete
content as follows:

According to the theory o
f balance, the initial
value of control parameter
0
t

should be selected big
enough. If the probability of initial value is assumed as
0
v
,

According to Metropolis Acceptance criteria,
0
exp( ) 1
J
at


. To make the formula to be set up, the
value of
0
t

should be big enough, but if the value of
0
t

is
too large, it will increase iteration times and computing
time. The best selection of
0
t
can ensure algorithm to get
minimum of global optimal solution. Kirkpatrick and
others proposed a method to select initial temperature,
called experience method. First, a great value is selected
as
0
t
and transformed seve
ral times, if Accept rate
v

is
less than scheduled initial accept rate
0
v

(usually take
0.8),
0
t
’s value double until
0
v v

. In the paper, we use
the method o
f the combination of experience method and
the objective function of K
-
harmonic means clustering to
select
0
t
. The method is as follows: First, make the
objective function of K
-
harmonic means clustering
as
0
t

value. Then according to the above method, transform
several times, if accept rate
0
v v


0
v
=0.8
),
0
t
’s
value double, until
0
v v

, at this time,
0
t

is the request
value; If accept rate
0
v v


0
v
=0.8
),
0
t
’s value is
in half, until
0
v v

, at this time,
0
t

is the request value,
This can take to meet the conditions of minimum value.
Because the selection of
0
t

is associated with the
experience method and objective function of K
-
harmonic
means clustering, it is called DK
-
0
t

selectio
n method
.

4) How to
generation

new solution

In order to make the algorithm balance in the
beginning

of the algorithm
, k
-
harmonic divides date set
into several cluster and clustering results are as initial
solutions. In next simulated iterative
process, bec
ause

the
calculation amount of k
-
harmonic is very
large
. To

reduce
the running time of the
algorithm, the

updating cluster

center and objective function

are

by

criterion of k
-
means

for

the following

iterative process

and we will

also get a
good clusterin
g result.

In K
-
harmonic means clustering algorithm with
simulated annealing, new
values

are

generated

by
disturbing the current solution,

that
is,

The algorithm will
naturally
move

one or more of these centers
in
to
other
areas .But initial solution
s


the c
lustering result of k
-
harmonic


are not clear

partitioned into
several

clusters

.In disturbance
process, each

point should be clear
partitioned

to the only cluster
. To sum up,
the
data split
into different cluster according k
-
harmonic algorithm

and
each po
int is divided to the only cluster with minimum
distance
principle, calculate

the corresponding target
function
.

N
ew algorithm

begins with the above clustering
result

.The
generat
ing way
of new

solution

is

by
disturbing the current solution.


IV.
VALIDATIO
N BASED ON K
-
HARMONIC
ALGORITHM


In order to evaluate the performance of k
-
harmonic means clustering algorithm with simulated
annealing, applying the new algorithm to IRIS data set.
The first one is the Iris data set that has N = 150 points
that each point

has four attributes

calyx long, calyx
wide, petals long, petals wide. In this article, the distance
between the cluster and actual value is as the evaluation
of the algorithm.

Due to distance of data attributes are very
different;

the date set must be nor
malized before
clustering. After 20 times experiments singly, some main
datas are gathered in TABLE I including the
maximum
,
minimum

and average of algorithm target functions,
difference by actual value, and average CPU time.
Among them the last two column
s are averages of
running by 20 times.


TABLE I

Clustering results of KM and SAKHM algorithm


algorithm

Minimum

average

Maximu
m

error

CPU
-
time

KM

140.94

175.56

207.06

18.89

0.05

SA
-
KHM

76.32

79.82

84.98

2.63

16.94

From
TABLE I
, the running time of K
-
me
an
algorithm is least, but different initial value and clustering
effect are different obviously. The
maximum

of target
function has biggest difference compared with the
minimum
, and then it shows that it is highly sensitive for
the initial value, and but
the difference is
very

obviously.
On the contrary, the algorithm time is increasing
obviously of K
-
harmonic means clustering with simulated
annealing
, but the change is smaller between target
functions, and the difference is small between cluster
center an
d actual center. And then the figure shows the
cluster result of the eighth running, the horizontal
ordinate shows a length of calyx, the Y
-
ordinate shows
the width of calyx.










































Fig
.2
.

is the result of SAKHM cluster
analysis,
compared with KM algorithm from
Fig.1

its point set has
some intersections, but it hasn’t from
Fig.1.

So it shows
the ability of global search and far away from the local
minimum

base on K
-
harmonic
means.

T
he experiment collects statistical chart

of the
first
bachelor

aspiration of science and engineering in
2009. The data set includes 195 datas that each data has 5
properties: total score,
Chinese score, math score, fore
i
gn
language score,
and school

level. If a school is 211
colleges

the level i
s
second

and if it is 985
colleges

the
level is third, if it is the 34 key universities the level is
fourth. The scores of Chinese, math, foreign language are
the least scores delivered first.

In the clustering process,

the setting of
0
t

respectively uses experience method and improved
method.

To prove this assertion we use an argument
similar to that of the new algorithm,

based on the same
initial assumptions. Each factor combination is tested 7
times with test probl
ems.

480
500
520
540
560
580
old parameter
improved parameter

Fig
.3.
Contrast two method s’ Objective function



Fig

.1.


Clustering results of

KM
algorithm



Fig
.
2
.
Clustering results of
SA
K
H
M
algorithm


Fig.3 compares the objective function
performance of the above two approaches. The full line
and the dotted line represent, respectively, the
unimproved parameter and impr
oved parameter
approaches. it can find out a less cluster target function
from the DK
-
0
t
. The
method

based on DK
-
0
t

selected
gets little change of target function, namely the value of
target function is mor
e stable, and the new cluster effect
is better.


V.
APPLYING IMPROVED ALGORITHM TO
SCHOOL XUNTONG


In recent years

more and more parents
concerned themselves with the situation of their children
in school. Certain companies cooperated with Mobile
Company

and have developed a digital campus system.
"School XunTong" is
application

software that can
exchan
ge students' information between parents and
teachers conveniently. "School XunTong" mainly targets
at primary and middle school students. Teachers can send
and receive text messages freely and enjoy Internet
service, parents can acquire students'
informat
ion

through
customizing business, company can profit from this
business too. Because the target customers
of “
School
XunTong" are for primary and middle school students of
the whole province or city, it has a large number of
complex data. It is difficult t
o manage and there is little
value for the company. With the increasing number of
users, the data is growing rapidly. People desire to find
useful information from the database of “School
XunTong”. From this perspective, analysis the database
of “School Xu
nTong” using the cluster analysis algorithm
becomes very meaningful.

The data to be analyzed come from school
database of"Shool XunTong" ,which includes more than
fou
r hundred students' information

from September 2009
to December 2009,the informatio
n

incl
ude students'
accounts, parents' accounts, mobile numbers, time of
sending message, students' achievements, parents' mobile
numbers, message themes, and so on.

The data should be preprocessed before
cluster;

the process of the preprocessing includes transf
erring data,
processing default value, processing abnormal value,
processing isolated points, etc.

1) The
data source

has
different data types that
include

numeric
,

text type, time type,

etc.

T
he
y
should be
converted to unified data types.

For example,

t
he

type of
grade is converted to
numeric;

s
enior school is equal to
10.

2) From

the sample datas, there are some
default values.
A
nd t
he default value is filled by the right
value that usually is the most frequency used.

3) Some

datas

may be error in
statist
ics, and
some can deviate from the data mean o
bviously
, so they
become outlier data.

For example,

some scores are more
than 100

or less than 0 and
some scores are 5
, 10.These

scores are far away from the average value.

These
data
s
should be
obsolete
, and t
hen
the cluster result will be
more effective.


TABLE



depicts

the variable

Chines
e
S
core
.



TABLE



Clustering results of KM and SAKHM algorithm


variable

Minimum

average

Maximum

error

Chinese

Score

10

58.67

108.00

20.60

In order to ensure the data poi
nts in a
certain

range, developing a standard makes the data points
in
[58.6667
-
2.0*20.6009

100]
.Beyond the range,the data
point called isolated point must be deleted.In Fig.4,there
are three scores more than 100,two points is below
58.6667
-
2.0*20.6009
.Then

remove these point 104

108

10 and remove all data item corresponding to these
point.
The rest of the points will be in
[58.6667
-
2.0*20.6009

100]
.



F
ig. 4 D
iagram

of YuwenScore

in
datab
set


4) Some

date
s
must be removed
that do not
connect with property

values

such as creation
time;

it is
acceptable for test score, counts of teachers


informations
.

Due to the
obvious
di
fferences in

the data
s
which have different
attributes
from the data set
,
in order
to not be covered
from large value
for the s
mall value

and
lose its key role
, and then t
he data

set
must be
normalized
before cluster. That is,

)
min(
)
max(
x
x
average
x
x



.

The new algorithm is applied to the data set of
"School XunTong".

Let parameters k=3

0
t
=0.00001

m
t
=48

op
MaxInnerLo
=6

MaxLoop
=1000, and
then the new algorithm splits a 428
-
pattern data set into
three, each

cluster respectively obtains 258,108,62

points.

The clustering result shows in
TABLE


.


The cluster is transferred to original data
form as the following:



TABLE




Cluster center of SAKHM algorithm



Chinese

Score

Math

Score

English

Score

Avg
-

Score

Life

Msg

Study

Msg

1

65.10

71.54

71.8
0

69.31

8.82

34.39

2

36.61

49.04

46.42

43.77

9.21

35.83

3

76.14

47.53

38.79

54.00

8.10

34.78


According to the data results of
TABLE


, this
article will analysis the cluster result of the School
XunTong’s data set based on the K
-
harmonic means
cluster
algorithm.

Sort first: the data set includes 258 data items,
60% of total students belong to this sort. According to the
clustering result, the scores of the students in the cluster
are
better, more

stable and have not learning branches.
From the extent of

communication between parents and
teachers, compared by other clusters, the messages from
teachers are less than students’ about learning and life. In
one word, teachers send a few text messages.

Sort second: the data set includes 108 data
items, and it i
s 25% of the counts of total students.
According to the clustering result, the students haven’t
learned

branches obviously, but the average scores are
less. From the extent of communication between parents
and teachers, compared by other clusters, the mess
ages
from teachers are more than students’ about learning and
life. In these sorts, messages from teachers are most.

Sort third: the data set includes 62 data items,
and it is a rarely part of the total. According to the
Analysis of the clustering result,
Chinese is very good,
math is medium, English is poor, average scores are low.
Life messages from teachers are far less than learning
messages.

According to the above
analysis, the

teachers
actively send far more messages than parents’ replying
messages wh
ich are mainly about learning. From
analysising the clustering result, enterprise can target on
the parents of those students who got poor academic
achievement. Life information of good students is a key
point for enterprise to persuade parents to open tex
t
message service. Enterprise also can analyze the
characteristics of customer group and set up new business.
The school can effectively use the analysis: Good
students can have comprehensive development; Students
that tend to be unbalanced on one or some
subjects should
learn weak courses and can rapidly improve scores;
Teacher should pay more attention to students whose
academic achievements are poor, and try to help them in
each subject. The students can make great achievement in
a short time.




.
CONCLUSION

Inspired by observation that shortcomings that
are sensitive to start point and alway converge to local
minimum of k
-
mean are essentially solved,this work
presents an efficient algorithm,

called K
-
harmonic means
clustering algorithm with simu
lated annealing.Applying
the new algorithm to IRIS date set,our experimental
results indicate that the proposed algorithm obtain results
better than those of k
-
means.Efforts are underway to
apply the proposed algorithm to "School XunTong" in
order to find
potential relationship between students’
achievement and the communication between parents and
teachers and guide students to study.


R
EFERENCES


[1]

M.
-
C. Chiang et al,A time
-
efficient pattern reduction
algorithm for k
-
means clustering,Information Sciences ,1
81
2011),pp. 716

731.

[2]

Z. Heng,Y. Wan
-
hai,A Fuzzy Clustering Algorithm Based
on K
-
harmonic Means(in china), Journal of circuits and
systems, vol. 9, no. 5, pp. 114
-
117.

[3]

Z. Güngör,A. Ünler.K
-
harmonic means data clustering with
simulated annealing heuristic,A
pplied Mathematics and
Copputatuin,2007,199
-
209

[4]

Z. Güngör, A.Ünler ,K
-
harmonic means data clustering
with simulated annealing heuristic,Applied Mathematical
Modelling ,32 (2008) 1115

1125.

[5]

B. Zhang,M.Hsu et al,K
-
harmonic means
-
a data clustering
algorithm,H
P Teachnical Report Hpl
-
2000
-
137Hewlett
-
Packard Labs,2000.

[6]

L. Wei
-
min,Z. Ai
-
yun,L.Sun
-
Jian,Z. Fanggen,S. Jang
-
sheng,Application Research of Simulated Annealing K
-
means clustering algorithm(in china), Microcomputer
Infermation, vol. 7, no. 3, pp. 182
-
184,20
08.

[7]


S.Kirkpatrick , et al,Optimization by Simulated
Annealing,. Science , vol. 220, no.4598, pp. 671
-
680,1983.

[8]

J. Z. C. Lai,H. Tsung
-
Jen,L. Yi
-
Ching,A fast k
-
means
clustering algorithm using cluster center displacement,

[9]

Pattern Recognition,42(2009),2551
-
2556.

[10]

T. Jinlan,Z. Lin,Z. Suqin,L. Lu,Improvement and
Paralelism of k
-
Means Clustering Algorithm,TSINGHUA
SCIENCE AND TECHNOLOGY, vol. 10, no.3 pp. 276
-
281,2005.

[11]

J. Pena, J. Lozano, P, Larranaga, An empirical comparison
of four initialization methods for t
he k
-
means
algorithm.Pattern recognition letters,1999,20:1027
-
1040.

[12]

A. Likas, N. Vlassis, J,Verbeek.The global k
-
means
clustering algorithm,IAS Technical Report IAS
-
UVA
-
01
-
02 Intelligent Autonomous System,2001.

[13]

Hamerly Greg,Elkan Charles.Alternatives to th
e k
-
means
algorithm that find better clusterings ,In 11th International
Conference on Information and Knowledge Management
(CIKM 2002),2002,600

607.

[14]

R.Ng , J.Han.Efficient , effective clustering method for
spatial data mining,In Proc.Int.Conf.Very large Da
ta Bases
San Francisco.CA:Morgan Kaufmann Publisher.1994,144
-
155.

[15]

McErlean F J,Bell D A,McClean S I.The Use of Simulated
Annealing for Clustering Data in Databases Information
Systems.1990,15(2).

[16]

Chen M, An overview from a database perspective,

[17]

IEEE Trans
on KDE,1996,8(6):866
-
883.