ORIGINAL PAPER

On mean shift-based clustering for circular data

Shou-Jen Chang-Chien

•

Wen-Liang Hung

•

Miin-Shen Yang

Springer-Verlag 2012

Abstract

Cluster analysis is a useful tool for data anal-

ysis.Clustering methods are used to partition a data set into

clusters such that the data points in the same cluster are the

most similar to each other and the data points in the dif-

ferent clusters are the most dissimilar.The mean shift was

originally used as a kernel-type weighted mean procedure

that had been proposed as a clustering algorithm.However,

most mean shift-based clustering (MSBC) algorithms are

used for numeric data.The circular data that are the

directional data on the plane have been widely used in data

analysis.In this paper,we propose a MSBC algorithm for

circular data.Three types of mean shift implementation

procedures with nonblurring,blurring and general methods

are furthermore compared in which the blurring mean shift

procedure is the best and recommended.The proposed

MSBC for circular data is not necessary to give the number

of cluster.It can automatically ﬁnd a ﬁnal cluster number

with good clustering centers.Several numerical examples

and comparisons with some existing clustering methods are

used to demonstrate its effectiveness and superiority of the

proposed method.

Keywords

Circular data

Clustering algorithms

Mean shift

Kernel functions

1 Introduction

Data measured in the form of angles or two-dimensional

orientations are usually called circular data.For example,

wind directions and departure directions of migrating birds

or animals are common circular data.However,some cir-

cular data may be not angles,like frequency of observed

event in each month of a year period,but they could be

translated to corresponding angles of points on the unit

circle.Since von Mises (

1918

) introduced a distribution on

circular data (i.e.directional data on the plane) and Watson

and Williams (

1956

) ﬁrst investigated inference problems

for the von Mises distribution and gave the construction of

statistical methods on circular data.,the researches on cir-

cular data got increasing concerns.Mardia (

1972

),Fisher

(

1993

) and Mardia and Jupp (

2000

) are good reference

books for the analysis of circular data with applications to

biology,geology,medicine,meteorology,oceanography,

etc.Furthermore,Mooney et al.(

2003

) considered circular

data for a case study involving sudden infant death syn-

drome (SIDS).Gao et al.(

2006

) applied themto investigate

the seasonality of disease onset.Carta et al.(

2008

) studied

statistical modeling of directional wind speeds.Corcoran

et al.(

2009

) used circular statistics to explore the geogra-

phy of the journey to work.Recently,Lee (

2010

) gave a

survey for a range of methods that had been developed over

the last 50 years to handle circular data.

Cluster analysis is a useful tool for data analysis.It is a

method for ﬁnding clusters of a data set with most simi-

larity in the same cluster and largest dissimilarity between

different clusters.From the statistical point of view,clus-

tering methods can be generally divided into two catego-

ries.One is a probability model-based approach,another is

a nonparametric approach.A probability model-based

approach is assumed that the data set follows a probability

S.-J.Chang-Chien

M.-S.Yang (

&

)

Department of Applied Mathematics,Chung Yuan Christian

University,Chung-Li 32023,Taiwan

e-mail:msyang@math.cycu.edu.tw

W.-L.Hung

Department of Applied Mathematics,National Hsinchu

University of Education,Hsin-Chu,Taiwan

123

Soft Comput

DOI 10.1007/s00500-012-0802-z

distribution,such as a mixture of densities,so that a mix-

ture likelihood approach to clustering can be used

(McLachlan and Basford 1988).For a mixture model,the

expectation and maximization (EM) algorithm (Dempster

et al.1977) is a most-used estimation algorithm.For a

nonparametric approach,clustering methods may be based

on an objective function of similarity or dissimilarity

measures so that partitional clustering is best used.Most

partitional methods suppose that the data set can be rep-

resented by ﬁnite cluster prototypes with their own objec-

tive functions.For a probability model-based approach to

clustering of circular data,the EMalgorithmwas well used

in the analysis of grouped circular data in the literature

(Mardia and Sutton 1975;Bartels 1984;Spurr and

Koutbeiy 1991).Since clustering for directional data is

important,there have many researches in a probability

model-based approach been considered,such as Mooney

et al.(2003),Banerjee et al.(2005),McGraw et al.(2006),

Dortet-Bernadet and Wicker (2008) and Thang et al.

(2008).For a nonparametric approach,Yang and Pan

(1997) proposed a fuzzy clustering algorithm for circular

data.However,these clustering algorithms are sensitive to

initials and outliers.

Kernel-based methods are widely used in many appli-

cations (Vapnik 1998;Cristianini and Shawe-Taylor 2000;

Filippone et al.2008).There are two ways for imple-

menting kernel-based methods along with supervised and

unsupervised learning.One is support vector machine by

transforming the data space into the high-dimensional

feature space with supervised learning (Scho

¨

lkopf et al.

1998;Cristianini and Shawe-Taylor 2000).Another one is

to ﬁnd a kernel density estimate on the data space and then

search the modes of the estimated density (Silverman

1998).The mean shift procedure ﬁrst proposed by

Fukunaga and Hostetler (1975) was used as a nonpara-

metric kernel density gradient estimate using a generalized

kernel approach.Cheng (1995) clariﬁed the relationship

between mean shift and optimization by introducing the

concept of shadows.He also showed some peculiar

behaviors of mean shift in cluster analysis with the most-

used Gaussian kernel.Comaniciu and Meer (2002)

introduced the mean shift-based feature space analysis

technique with the quality of outputs only controlled by the

kernel bandwidth,i.e.,the resolution of the analysis,so that

the technique could be easily integrated into complex

vision systems.Furthermore,Wu and Yang (2007) pro-

posed a mean shift-based clustering (MSBC) method as a

clustering tool for numeric data.Recently,Kobayashi and

Otsu (2010) proposed the so-called von Mises-Fisher

means shift clustering for the data on a unit hypersphere

(i.e.,directional data in s-dimensional Euclidean space).

Kobayashi and Otsu (2010) embedded the von Mises-

Fisher distribution to the mean shift algorithm.According

to the previously discussed categories of clustering,Ko-

bayashi and Otsu’s method should be belonged to the

probability model-based approach.Recently,Chang-Chien

et al.(2010) had also considered Wu and Yang’s (2007)

(blurring) MSBC for circular data by deﬁning the distance

measure between two angles for circular data.In this case,

Chang-Chien’s method should be a nonparametric

approach.Although both of Kobayashi and Otsu (2010)

and Chang-Chien et al.(2010) are based on mean shift

algorithm,they adopt different approaches.In this paper,

we extended Chang-Chien et al.(2010) ﬁrst using a nor-

malized parameter b and then by considering three kinds of

implementation procedures,i.e.nonblurring,blurring and

general mean shifts,for circular data.More extensions and

comparisons with numerical and real examples are also

considered in this paper.The proposed clustering algorithm

gives us a new way in the analysis of grouped directional

data on the plane without giving a priori cluster number.

Those examples and comparisons actually demonstrate its

effectiveness and superiority of our proposed method.

The remainder of this paper is organized as follows.In

Sect.2,we ﬁrst consider circular data with its distribution

and then review the EM(Bartels 1984;Spurr and Koutbeiy

1991) and fuzzy c-directions (FCD) (Yang and Pan 1997)

algorithms for circular data.In Sect.3,we propose the

MSBC algorithm for circular data.The three kinds of

implementation procedures with nonblurring,blurring and

general mean shifts are also compared.In Sect.4,some

examples with numerical and real data sets and compari-

sons with the existing methods are made.Finally,conclu-

sions are stated in Sect.5.

2 EM and FCD algorithms for circular data

In this section,we ﬁrst consider circular data with its dis-

tribution and then review the EM (Bartels 1984;Spurr and

Koutbeiy 1991) and FCD (Yang and Pan 1997) algorithms

for circular data.In a two-dimensional Cartesian coordinate

system,the direction h of a vector can be regarded as a unit

vector or a point with (cos h,sin h) on the unit circle with

0 B h B 2p.Angles can be measured indegree.Statisticians

proposed various distributions on circular data.The von

Mises distribution VM(m,j) is a well-known probability

model in circular statistics,just like the normal distribution

in linear statistics.The two parameters m and j represent the

mean direction and concentration of the distribution,

respectively.If j is larger,then the distribution will

more focus on m.The probability density function f(h;m,j) is

of the form f(h;m,j) = (2pI

0

(j))

-1

exp(j cos(h - m)),

0 B h\2p,0 B j\?,where I

0

(j) = (2p)

-1

R

2p

0

exp

ðj cos(h mÞÞ dh is the modiﬁed Bessel function of order

S.-J.Chang-Chien et al.

123

zero.Tocluster groupedcircular data,a mixture of vonMises

distributions is generally used as a probability model.

The EM algorithm (Dempster et al.1977;McLachlan

and Basford 1988) is a useful tool to cluster the grouped

data with a mixture model.It can also be applied to circular

data.Suppose the data set X = {x

1

,x

2

,…,x

n

} is a random

sample from a population consists of c subpopulations.Let

x be an observation and the mixture density f ðx;a;/Þ ¼

P

c

i¼1

a

i

f

i

ðx;/

i

Þ;where a

i

is the mixing proportion of the

subpopulation i and f

i

(x;/

i

) is the corresponding density,

i = 1,2,…,c.Let us consider z

1

,z

2

,…,z

c

as the indi-

cator functions such that z

ij

= z

i

(x

j

) = 1,if x

j

arises from

the subpopulation i and z

ij

= 0 if x

j

arises from other

subpopulations,for i = 1,2,…,c and j = 1,2,…,

n.Then,the log-likelihood function base on the data set

{x

1

,…,x

n

,z

1

,…,z

c

} is L

EM

ða;/;x

1

;...;x

n

;z

1

;...;z

c

Þ ¼

P

c

i¼1

P

n

j¼1

z

ij

ðln a

i

f

i

ðx

j

;/

i

ÞÞ:The EMalgorithmconsists of

‘‘Expectation’’ step and ‘‘Maximum’’ step.In the expec-

tation step,the expectation Eðz

ij

jx

j

Þ ¼ a

i

f

i

ðx

j

j/

i

Þ=

P

c

k¼1

a

k

f

k

ðx

j

j/

k

Þ is used to substitute for missing data z

ij

.

In the maximum step,we seek estimates of the parameters

by maximizing E L

EM

ð Þ ¼

P

c

i¼1

P

n

j¼1

Eðz

ij

jx

j

Þðln a

i

f

i

ðx

j

;

/

i

ÞÞ with the constraint

P

c

i¼1

a

i

¼ 1:

We next use above steps for circular data.Suppose that the

data set X = {h

1

,h

2

,…,h

n

} is a random sample from a

mixture of von Mises distributions where the probability

density of an observation h

j

from subpopulation i is

f

i

ðh

j

;m

i

;j

i

Þ ¼ ð2pI

0

ðj

i

ÞÞ

1

expðj

i

cosðh

j

m

i

ÞÞ;0h

j

2

p;0 j

i

\1:Therefore,the Lagrangian of E(L

EM

)

is Lða;m;j;kÞ ¼

P

c

i¼1

P

n

j¼1

Eðz

ij

jh

j

Þ lnða

i

f

i

ðh

j

;m

i

;j

i

ÞÞ k

ð

P

c

i¼1

a

i

1Þ:We take the ﬁrst derivatives of L(a,m,j,k)

with respect to all parameters and set them to be equal to

zero.Then,the update equations for estimates of parameters

are obtained as follows:

a

i

¼

P

n

j¼1

z

ij

n

;i ¼ 1;2;...;c ð1Þ

m

i

¼ tan

1

P

n

j¼1

z

ij

sinðh

j

Þ

P

n

j¼1

z

ij

cosðh

j

Þ

!

;i ¼ 1;2;...;c ð2Þ

j

i

¼ A

1

P

n

j¼1

z

ij

cosðh

j

m

i

Þ

P

n

j¼1

z

ij

!

;i ¼ 1;2;...;c;ð3Þ

where A

1

ðxÞ a function that can be computed from

Batschelet’s table (see Fisher 1993).On the other hand,the

missing data z

ij

are estimated with posterior probabilities

z

ij

¼

a

i

f

i

ðh

j

;m

i

;j

i

Þ

P

c

k¼1

a

k

f

k

ðh

j

;m

k

;j

k

Þ

;

i ¼ 1;2;...;c;j ¼ 1;2;...;n:

ð4Þ

Thus,the EM algorithm for directional data can be

summarized as follows (see also Bartels 1984;Spurr and

Koutbeiy 1991):

EM algorithm

Step 1.Fix 2 B c B n and ﬁx e [0.Give initials

z

(0)

= (z

1

(0)

,…,z

c

(0)

) and let s = 1.

Step 2.Compute a

(s)

with z

(s-1)

using (1).

Step 3.Compute m

(s)

with z

(s-1)

using (2).

Step 4.Compute j

(s)

with z

(s-1)

,m

(s)

,and {h

1

,h

2

,…,h

n

}

using (3).

Step 5.Update to z

(s)

with a

(s)

,m

(s)

,j

(s)

,and {h

1

,

h

2

,…,h

n

} using (4).

Step 6.Compute d(z

(s)

,z

(s-1)

) in a convenient metric

norm d.

IF d z

ðsÞ

;z

ðs1Þ

\e;STOP

ELSE s ¼ s þ1 and return to Step 2:

Yang and Pan (1997) proposed another useful method

called fuzzy c-directions (FCD) clustering algorithm.

Suppose that the data set X = {h

1

,h

2

,…,h

n

} is a random

sample froma mixture of von Mises distributions.Yang and

Pan (1997) proposed the following objective function

B

m;w

ðl;a;m;jÞ ¼

X

n

j¼1

X

c

i¼1

l

m

i

ðh

j

Þðln2pI

0

ðj

i

Þ

þj

i

cosðh

j

m

i

ÞÞ þw

X

n

j¼1

X

c

i¼1

l

m

i

ðh

j

Þ lna

i

;

where the constraints are

P

c

i¼1

a

i

¼1 and

P

c

i¼1

l

i

ðh

j

Þ ¼1

for j = 1,2,…,n in which l = {l

1

,…,l

c

} is a fuzzy

c-partition that u

ij

= l

i

(h

j

) are in the interval [0,1] with the

value l

i

(h

j

) as the degree of membership of h

j

belonged to

the subpopulation i.The Lagrangian L(l,a,m,j,k,c

1

) of

B

m,w

(l,a,m,j) is Lðl;a;m;j;k;c

1

Þ ¼B

m;w

ðl;a;m;jÞ

P

n

j¼1

k

j

ð

P

c

i¼1

l

i

ðh

j

Þ 1Þ c

1

ð

P

c

i¼1

a

i

1Þ.We take the

ﬁrst derivatives of L(a,m,j,k) with respect to all parameters

and set them equal to zero.The update equations for

estimates of parameters can be obtained as follows:

a

i

¼

P

n

j¼1

l

m

ij

P

c

i¼1

P

n

j¼1

l

m

ij

;i ¼ 1;2;...;c ð5Þ

m

i

¼ tan

1

P

n

j¼1

l

m

ij

sinðh

j

Þ

P

n

j¼1

l

m

ij

cosðh

j

Þ

0

B

B

B

@

1

C

C

C

A

;i ¼ 1;2;...;c ð6Þ

j

i

¼ A

1

P

n

j¼1

l

m

ij

cosðh

j

m

i

Þ

P

n

j¼1

l

m

ij

!

;i ¼ 1;2;...;c ð7Þ

On mean shift-based clustering for circular data

123

l

ij

¼

X

c

t¼1

ðln2pI

0

ðj

i

Þ j

i

cosðh

j

m

i

Þ wlna

i

Þ

1=ðm1Þ

ðln2pI

0

ðj

t

Þ j

t

cosðh

j

m

t

Þ wlna

t

Þ

1=ðm1Þ

!

1

;

i ¼1;2;...;c;j ¼1;2;...;n

ð8Þ

in which A

-1

is a function that can be computed from

Batschelet’s table.Thus,the FCD algorithm(Yang and Pan

1997) can be summarized as follows:

FCD algorithm

Step 1.Fix m C 1,2 B c B n and w[0.Fix any e [0

and give an initial l

(0)

.

Step 2.Compute a

(k)

with a

(k-1)

using (5).

Step 3.Compute m

(k)

with m

(k-1)

using (6).

Step 4.Compute j

(k)

with l

(k-1)

,m

(k)

,and {h

1

,h

2

,…,h

n

}

using (7).

Step 5.Update to l

(k)

with a

(k)

,m

(k)

,j

(k)

,and {h

1

,

h

2

,…,h

n

} using (8).

Step 6.Compute d(l

(k)

,l

(k-1)

) in a convenient metric

norm d.

IF dðl

ðkÞ

;l

ðk1Þ

Þ\e;STOP

ELSE k ¼ k þ1 and return to Step 2:

3 The proposed clustering method for circular data

A statistical kernel-based method to clustering is to ﬁnd a

kernel density estimate and then search the modes of the

estimated density.Mean shift procedures are methods for

ﬁnding the modes of a kernel density estimate.Fukunaga

and Hostetler (1975) ﬁrst proposed the mean shift proce-

dure based on a nonparametric density function gradient

estimate using a generalized kernel approach.Cheng

(1995) showed the behaviors of mean shift in cluster

analysis with the most-used Gaussian kernel.Fashing and

Tomasi (2005) claimed that mean shift is a bound opti-

mization and equivalent to Newton’s method in the case of

piecewise constant kernels.Wu and Yang (2007) proposed

a MSBC method in which the procedure consists of four

steps:(1) select a kernel,(2) estimate the stabilization

parameter,(3) use the mean shift procedures,and (4)

identify the clusters.The MSBC procedure could be used

with different kernels.In this paper,we will modify the

MSBC for circular data.

Let X = {x

1

,x

2

,…,x

n

} be a data set in an

s-dimensional Euclidean space R

s

.The generalized

Epanechnikov kernel K

E

p

:X?R is deﬁned as

K

p

E

ðxÞ ¼

ð1 jjx x

j

jj

2

=bÞ

p

if jjx x

j

jj

2

b;

0 if jjx x

j

jj

2

[b:

The normalization parameter b is generally set to be the

sample variance.The parameter p is called the stabilization

parameter.The performance of the density estimates

depends on the estimate for p.Bandwidth selection may

affect performance of the density estimate.Wu and Yang

(2007) proposed a technique by assigning a ﬁxed sample

variance to bandwidth which is different to bandwidth

selection.Reference to the generalized Epanechnikov

kernel,we deﬁne a kernel and apply MSBC procedure on

the circular data.Let X = {h

1

,h

2

,…,h

n

} be the circular

data.The kernel function K

p

:X?[0,1] is deﬁned as

where p is the stabilization parameter and b is the sample

circular standard deviation on X deﬁned by b ¼

ð2ln

RÞ

1=2

;where

R ¼ jj

P

n

j¼1

ðcos h

j

;sin h

j

Þ=njj is the

mean resultant length.The distance measure between

the two angles h and h

j

is 1 - cos (h - h

j

).We normalize

the distance measure by dividing the parameter b.We

mention that,in the conference paper,Chang-Chien et al.

(2010) considered the kernel function K

p

ðhÞ ¼

ð1

1cosðhh

j

Þ

2

Þ

p

¼ ð

1þcosðhh

j

Þ

2

Þ

p

by dividing 2.In this

paper,we give a more general way by choosing b as a

normalized parameter.Wu and Yang (2007) use the

graphical method of correlation comparisons for

estimating p.We also use the same method to ﬁnd

suitable estimate for p.The suitable p value provides a

good approximate density shape for the data set.In general,

a good operating range for p falls between 1 and 50.The

kernel density estimate for circular data is given by

^

f

K

p

ðhÞ ¼

X

n

j¼1

K

p

ðhÞwðh

j

Þ;

where K

p

(h) is the previously described kernel function and

w(h

j

) is a weight function on the data set.Throughout the

paper,we use the weight w(h

j

) = 1/n,j = 1,2,…,n,i.e.,

each data point has equal weight.These f

^

f

K

p

ðh

i

Þ;i ¼

1;2;...;ng represent the values of the estimated density

shape on data points.Compute the correlation values of

K

p

ðhÞ ¼

1

1cosðhh

j

Þ

b

p

¼

bð1cosðhh

j

ÞÞ

b

p

;if 1 cosðh h

j

Þ b

0;if 1 cosðh h

j

Þ [b

(

;

S.-J.Chang-Chien et al.

123

f

^

f

K

p

ðh

i

Þ;i ¼ 1;2;...;ng with pairs (p = 1,p = 2),

(p = 2,p = 3),…,(p = 50,p = 51),etc.The increased

shift of p depends on the data set.For the circular data,the

value 1 is an appropriate value for the increased shift of

p.The graph of [j,the correlation value with (p = j,

p = j?1)] with j = 1,2,…,50 is plotted for ﬁnding a

suitable estimate of p.There are variations for the

approximate density shape with the increase of p.When the

correlation value is closed to 1,the approximate density

shape remains steady.If the correlation value for the ith

point is closed to 1 in the graph,then the value i is the

suitable estimate for p.

The MSBC procedure could be used to ﬁnd the modes of

a kernel density estimate of circular data.We take the ﬁrst

derivatives of the kernel density estimate

^

f

K

p

ðhÞ and set it

to be zero.Then,we obtain a mode (cluster center) esti-

mate as follows:

m ¼ tan

1

P

n

j¼1

maxðb ð1 cosðm h

j

ÞÞ;0Þ

p1

sinðh

j

Þwðh

j

Þ

P

n

j¼1

maxðb ð1 cosðm h

j

ÞÞ;0Þ

p1

cosðh

j

Þwðh

j

Þ

!

:

The MSBC has three kinds of implementation

procedures.They are nonblurring,blurring and general

mean shifts.These are described as follows.

(A) The nonblurring mean shift uses all data points as the

initial values of cluster centers (i.e.m

i

(0)

= h

i

,

i = 1,2,…,n) such that all cluster centers agglom-

erate to optimal locations when increasing the

iterations.Hence,its form for mode estimates is as

follows:

v

tþ1ð Þ

i

¼ tan

1

P

n

j¼1

max b 1 cos v

tð Þ

i

h

j

;0

p1

sin h

j

w h

j

P

n

j¼1

max b 1 cos v

tð Þ

i

h

j

;0

p1

cos h

j

w h

j

0

B

@

1

C

A

;

i ¼ 1;2;...;n;

where t is the iteration number.

(B) The blurring mean shift also use all data points as the

initial values of cluster centers,but update data point

h

j

with m

j

(i.e.h

j

/m

j

),j = 1,2,…,n so that the

density estimate

^

f

K

p

ðhÞ is changed at each iterative.

Hence,its form for mode estimates is as follows:

v

tþ1ð Þ

i

¼ tan

1

P

n

j¼1

max b 1 cos v

tð Þ

i

v

tð Þ

j

;0

p1

sin v

tð Þ

j

w v

tð Þ

j

P

n

j¼1

max b 1 cos v

tð Þ

i

v

tð Þ

j

;0

p1

cos v

tð Þ

j

w v

tð Þ

j

0

B

@

1

C

A

;

i ¼ 1;2;...;n;

where t is the iteration number.

(C) The general mean shift is to give c initial values

randomly,where c is smaller than n and could be

chosen as

ﬃﬃﬃ

n

p

:In general,these c initial values are

randomly picked from the n data points.Hence,its

form for mode estimates is as follows:

v

tþ1ð Þ

i

¼ tan

1

P

n

j¼1

max b 1 cos v

tð Þ

i

h

j

;0

p1

sin h

j

w h

j

P

n

j¼1

max b 1 cos v

tð Þ

i

h

j

;0

p1

cos h

j

w h

j

0

B

@

1

C

A

;

i ¼ 1;2;...;c;

where t is the iteration number.

Furthermore,to complete the MSBC procedure,

agglomerating cluster centers lead us a way to determine

the number of cluster with well-identiﬁed cluster centers

where the Agglomerative Hierarchical Clustering (AHC)

algorithm is used to identify clusters.There are different

linkages in the AHC algorithm.In this paper,we simply

use the single linkage.Thus,the MSBC algorithm for

circular data can be summarized as follows:

MSBC algorithm for circular data

Step 1.Choose the proposed K

p

kernel.

Step 2.Use the graphical method of correlation compar-

isons for estimating p.

Step 3.Choose one of three procedures fromnonblurring,

blurring and general mean shifts.

Step 4.Use AHC algorithm to identify clusters.

We mention that the proposed MSBC algorithm for

circular data is a nonparametric approach.Recently,

Kobayashi and Otsu (2010) proposed the von Mises-Fisher

mean shift for clustering on a unit hypersphere that is a

probability model-based approach.We know that the von

Mises-Fisher distribution has a probability density function

M(x;m,j) = C

M

(j) exp (jx

0

m),where x is an s-dimen-

sional unit vector and m is the mean direction and j is the

concentration parameter and C

M

(j) is the normalization

constant.Since the von Mises-Fisher distribution is based

on a monotonically increasing convex function and an

inner product x

0

m,Kobayashi and Otsu (2010) generalized

the von Mises-Fisher distribution to be F(x;m,j) =

C

F

(j)f(x

0

m;j),where f is a monotonically increasing

convex function with g being a derivative of f.Then,

the kernel density estimate p(x) is constructed as pðxÞ ¼

C

F

ðjÞ

n

P

n

j¼1

f ðx

0

x

j

;jÞ:To maximize p(x) with the constraint

jjxjj ¼ 1,Kobayashi and Otsu (2010) derived the mode

estimate as x ¼

P

n

j¼1

x

j

gðx

0

x

j

;jÞ

jj

P

n

j¼1

x

j

gðx

0

x

j

;jÞjj

that was used in their

Algorithm 1.Suppose that {x

1

,x

2

,…,x

n

} is an

s-dimensional data set fromthe vonMises-Fisher distribution.

On mean shift-based clustering for circular data

123

Intheir experimental results,Kobayashi and Otsu(2010) used

the kernel function f(x;j) and its derivative g as follows:

f ðx;jÞ ¼

0 0x j

1

2

ðx jÞ jx 1

and

gðx;jÞ ¼

0 0 x j

x j jx 1

:

We will use these speciﬁed functions to make the

comparisons of Kobayashi and Otsu (2010) and our MSBC

algorithm in the next section.According to the above

statement,we can ﬁnd differences between Kobayashi and

Otsu (2010) and our methods.The method proposed by

Kobayashi and Otsu (2010) used the von Mises-Fisher

distribution.Therefore,the kernel function f(x;j) and the

kernel density estimate p(x) and the mode estimates in

Kobayashi and Otsu (2010) are all dependent on the

parameter j.However,in our method,we deﬁne the kernel

function K

p

(h) as

where the distance measures between two angles are used

to deﬁne K

p

(h).Obviously,our proposed kernel function

K

p

(h) for the MSBC algorithm is independent of the

parameter j.On the other hand,our method can be used for

different distributions,such as a mixture of von Mises

distributions,a mixture of wrapped normal distributions

and a mixture of wrapped Cauchy distributions,etc.We

will demonstrate these in Example 1.Moreover,we also

use the stabilization parameter p to control the number of

peaks (modes) in the data set and then use a correlation

comparison procedure to give a good estimate of p.The

concept of the stabilization parameter p is not used in

Kobayashi and Otsu’s (2010) method.

Since the proposed MSBC algorithm is independent to

the parameter j,if we want to handle those data from a

mixture of von Mises distributions,we need to consider an

estimate for the concentration parameter j.The von Mises

probability density function f(h;m,j) is with the form

f ðh;m;jÞ ¼ ð2pI

0

ðjÞÞ

1

expðjcosðh mÞÞ;

0h\2p;0 j\1;

where I

0

ðjÞ ¼ ð2pÞ

1

R

2p

0

exp(jcos(h mÞÞ dh is the modi-

ﬁed Bessel function of order zero.In this paper,we use

the function A

-1

to estimate j from the Batschelet’s

table (see Fisher 1993).Suppose X

i

is the ith cluster,h

j

is a

data point in X

i

,and m

j

is its corresponding cluster center.

We set

x ¼

P

8h

j

2X

i

cosðh

j

m

j

Þ

the number of data points in X

i

:

Then A

-1

(x) is the estimate of concentration for the ith

cluster in which A(j) is the ﬁrst derivative of ln I

0

(j) with

respect to j and can be computed from the Batschelet’s

table.

We now compare these three kinds of implementation

procedures using a mixture of von Mises distributions,a

mixture of wrapped normal distributions and a mixture of

wrapped Cauchy distributions.We calculate the mean

squared error (MSE) to compare their accuracy for the

three kinds of implementation procedures.Recall that a

wrapped distribution on a circle is derived from the dis-

tribution on a line.Suppose X is a random variable on a

line with its probability density function (pdf) g(x) and

cumulative distribution function (cdf) G(x).The random

variable H of the wrapped distribution could be obtained

by the transformation H X½mod2p and its corresponding

cdf F(h) is FðhÞ ¼

P

1

k¼1

½Gðh þ2pkÞ Gð2pkÞ.The

pdf f(h) of H become to be f ðhÞ ¼

P

1

k¼1

gðh þ2pkÞ.

There are many commonly wrapped distributions on a

circle such as the wrapped normal distribution WN(m q)

and the wrapped Cauchy distribution WC(m,q).The

parameters m and q are the mean direction and the mean

resultant length,respectively.The distribution WN(m,q) is

a symmetric unimodal distribution which is obtained by

wrapping the normal distribution on a line,with the pdf

f ðhÞ ¼

1

2p

ð1 þ2

P

1

p¼1

q

p

2

cos pðh mÞÞ;0 h2p;0

q1.The distribution WC(m,q) is a symmetric uni-

modal distribution which is obtained by wrapping the

Cauchy distribution on the line,with the pdf f ðhÞ ¼

1q

2

2pð1þq

2

2qcosðhmÞÞ

;0h2p;0 q1:

Example 1 For comparing the three kinds of implemen-

tation procedures,nonblurring,blurring and general mean

shifts,we consider the simulation data from the mixture of

von Mises distributions by generating 120 data points from

0.3VM(0.5p,10.3)?0.7VM(p,12.8).The result with

p = 9 for estimating stabilization parameter p using the

graphical method of correlation comparisons to the data set

is shown in Fig.1.We use the estimated p = 9 for these

three kinds of implementation procedures.Figures 2,3 and

4 show the agglomerated conditions of cluster centers

under the different implementation procedures.When we

K

p

ðhÞ ¼

1

1cosðhh

j

Þ

b

p

¼

bð1cosðhh

j

ÞÞ

b

p

;if 1 cosðh h

j

Þ b;

0;if 1 cosðh h

j

Þ [b

(

S.-J.Chang-Chien et al.

123

compare the part (a) in Figs.2,3 and 4,we ﬁnd that there

are the same initial cluster centers for the blurring and

nonblurring mean shifts,but only 11 initial cluster centers

for the general mean shift.To observe the part (d) in

Figs.2,3 and 4,the cluster centers have been agglomerated

to two points for all three procedures.For the general mean

shift,initial cluster centers are randomly assigned from all

data points.It is possible that all initial cluster centers come

from the same subpopulation.In this situation,the cluster

centers may be agglomerated to one point.It means that

there is only one cluster in this data set.Therefore,we may

have a bad cluster with the general mean shift.Further-

more,the hierarchical trees and identiﬁed clusters for dif-

ferent mean shift procedures are shown in Figs.5,6,and 7.

From the part (a) of Figs.5,6 and 7,we ﬁnd that three

hierarchical trees are different,but they all indicate that the

-1

0

1

-1

0

1

iteration = 0

ν

-1

0

1

-1

0

1

iteration = 2

ν

-1

0

1

-1

0

1

iteration = 3

ν

-1

0

1

-1

0

1

iteration = 4

ν

Fig.2 Agglomerated condition of cluster centers with blurring mean

shift

-1

0

1

-1

0

1

iteration = 0

ν

-1

0

1

-1

0

1

iteration = 3

ν

-1

0

1

-1

0

1

iteration = 6

ν

-1

0

1

-1

0

1

iteration = 10

ν

Fig.3 Agglomerated condition of cluster centers with nonblurring

mean shift

-1

0

1

-1

0

1

iteration = 0

ν

-1

0

1

-1

0

1

iteration = 2

ν

-1

0

1

-1

0

1

iteration = 4

ν

-1

0

1

-1

0

1

iteration = 6

ν

Fig.4 Agglomerated condition of cluster centers with general mean

shift

0

5

10

15

20

25

30

35

40

45

50

0.98

0.985

0.99

0.995

1

1.005

suitable estimate

Fig.1 Graph of correlation comparisons with the estimate p = 9

119

120

118

117

34

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

0

0.2

0.4

0.6

0.8

1

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

1

1

111

1

1

11

1

1

1

1

1

1

1

1

1

1

1

111

1

11

1

1

1

1

1

1

1

(b)

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

Fig.5 a Hierarchical tree with blurring mean shift,b identiﬁed

clusters

On mean shift-based clustering for circular data

123

cluster number is 2.From part (b) of Figs.5,6 and 7,the

same clustering results are obtained using the blurring and

general mean shift procedures.

In order to calculate the MSE for the mean direction

m = (m

1

,m

2

).we generate 100 data sets from 0.3VM(0.5p,

10.3)?0.7VM(p,12.8).The MSE is deﬁned as the

average of squared errors (SE) between the true and esti-

mated values for these generated data sets,i.e.,

MSE =

sum of SE of each data set for m

100

:These MSEs of three dif-

ferent procedures are shown in Table 1.We also consider

the mean values and standard deviations for m under these

100 estimates as shown in Table 1.From Table 1,the

blurring mean shift has the smallest MSE among three.

Thus,the blurring mean shift has the best accuracy when a

data set comes from a mixture of von Mises distributions.

Moreover,the means for m under these 100 estimates from

three kinds of implementation procedures are also closed to

the true values of m.However,the standard deviations for m

under these 100 estimates from the blurring mean shift has

the smallest value among them.It means that the blurring

mean shift is more stable than the other two.Overall,the

MSE and standard deviation for m do emphasize that the

blurring mean shift should be better than the nonblurring

and general mean shift procedures.

Similar to the previous comparisons,we now replace the

mixture of von Mises distributions 0.3VM(0.5p,

10.3)?0.7VM(p,12.8) with a mixture of wrapped normal

distributions 0.3WN(0.5p,0.95)?0.7WN(p,0.96) and

also a mixture of wrapped Cauchy distributions

0.3WC(0.5p,0.91)?0.7WC(p,0.925).To generate data

from a mixture of wrapped normal distributions and a

mixture of wrapped Cauchy distributions can refer the

book of Fisher (1993).The MSEs for three different pro-

cedures from the two distributions are shown in Table 2a

and b,respectively.From Table 2a,b,we ﬁnd that the

blurring mean shift has the smallest MSEs and standard

deviations among three.In general,von Mises distribution,

wrapped normal distribution and wrapped Cauchy distri-

bution are the most-used probability models for circular

data.In our simulation,we ﬁnd that the blurring mean shift

has the smallest MSEs among the three procedures for data

sets from a mixture of von Mises distribution,a mixture of

wrapped normal distribution and a mixture of wrapped

Cauchy distribution.

As a whole,we could conclude that the blurring mean

shift is recommended as an implementation procedure for

the proposed MSBC algorithm.Thus,we shall choose the

blurring mean shift procedure for analyzing the grouped

circular data.We mention that the graphical method of

correlation comparisons is used for estimating p.For

example,we consider p = 9 as the estimate of the stabil-

ization parameter p using the graphical method of corre-

lation comparisons to the data set as shown in Fig.1.

However,we may be interested to see the impact on

Table 2 MSE,mean and standard deviation for m when the model is

(a) 0.3WN(0.5p,0.95)?0.7WN(p,0.96) and (b) 0.3WC(0.5p,

0.91?0.7WC(p,0.925)

Implementation

procedure

Blurring Nonblurring General

(a)

MSE for m 2.12 2.21 4.50

Mean for m (89.67,180.25) (90.01,180.08) (88.86,180.11)

Standard

deviation

for m

(1.60,1.24) (1.80,1.10) (2.41,1.40)

(b)

MSE for m 2.88 4.48 4.50

Mean for m (88.89,180.17) (88.88,180.10) (88.86,180.11)

Standard

deviation

for m

(1.84,1.08) (2.40,1.41) (2.41,1.40)

Table 1 MSE,mean and standard deviation for m when the model is

0.3VM(0.5p,10.3)?0.7VM(p,12.8)

Implementation

procedure

Blurring Nonblurring General

MSE for m 3.11 3.84 4.14

Mean for m (89.70,179.71) (89.87,179.77) (89.93,179.77)

Standard

deviation

for m

(1.94,1.53) (2.09,1.82) (2.17,1.90)

3

4

6

7

5

8

9

10

2

1

11

0

0.2

0.4

0.6

0.8

1

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

1

1

1111

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

(b)

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

Fig.7 a Hierarchical tree with general mean shift,b identiﬁed

clusters

9

10

1

5

27

3

16

6

23

4

8

20

33

11

19

14

22

21

7

15

25

31

2

32

13

17

12

28

26

18

29

30

24

34

71

89

45

40

53

68

104

88

119

95

81

96

36

84

63

118

72

74

114

54

110

112

103

37

82

43

55

65

77

44

90

105

41

93

79

47

91

85

86

99

115

61

52

60

51

94

98

78

92

102

106

70

108

113

67

73

75

57

62

80

59

42

46

50

64

76

83

38

101

39

58

97

120

49

100

66

56

117

116

35

107

109

111

48

69

87

0

0.2

0.4

0.6

0.8

1

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

11 1111

1

11

11

1

1

1

1

1

1

1

11

11

1

11

1

1

1

1

1

1

1

(b)

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

Fig.6 a Hierarchical tree with nonblurring mean shift,b identiﬁed

clusters

S.-J.Chang-Chien et al.

123

clustering results when different values of p are selected.

We next continue Example 1 to have more demonstrations

about impacts of different values of the stabilization

parameter p.

Example 2 In order to show the impact on clustering

results using the graphical method of correlation compar-

isons,we consider different values of the stabilization

parameter p to the generating 120 data points from the

mixture of von Mises distributions 0.3VM(0.5p,

10.3)?0.7VM(p,12.8) used in Example 1.We then

analyze their clustering results by implementing the MSBC

algorithm (only blurring).The clustering results of MSBC

with p = 1 are shown in Fig.8.Figure 8a presents that

cluster centers are agglomerated to one point.Thus,MSBC

with p = 1 cannot cluster the data set into two clusters,but

only one cluster as shown in Fig.8b.The clustering results

of MSBC with p = 3 are shown in Fig.9.Figure 9a pre-

sents that cluster centers are agglomerated to two points.

Thus,MSBC with p = 3 actually cluster the data set into

two clusters,but these clustering results are obviously not

good as shown in Fig.9b.The clustering results of MSBC

with p = 5 as shown in Fig.10 are better than those with

p = 3,but these clustering results are still not good enough

as shown in Fig.10b.We ﬁnd that the clustering results of

MSBC with p = 7 as shown in Fig.11 have exactly the

same as those with p = 9 as shown in Fig.5b even though

for p = 11 and p = 13 as shown in Figs.12 and 13,

respectively.That signiﬁes,most values of p around p = 9

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

1

1

111

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1

(b)

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Fig.12 a Final agglomerated condition of cluster centers with

p = 11,b identiﬁed clusters with p = 11

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

11 1111

1

11

11

1

1

1

1

1

1

1

11

11

1

11

1

1

1

1

1

1

1

(b)

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Fig.10 a Final agglomerated condition of cluster centers with p = 5,

b identiﬁed clusters with p = 5

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

1

(b)

22 2222

2

22

22

2

2

2

2

2

2

2

22

22

2

2

22

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Fig.8 a Final agglomerated condition of cluster centers with p = 1,

b identiﬁed clusters with p = 1

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

1

1 1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1

(b)

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Fig.13 a Final agglomerated condition of cluster centers with

p = 13,b identiﬁed clusters with p = 13

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

11 1111

1

11

11

1

1

1

1

1

1

11

11

1

11

1

1

1

1

1

(b)

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Fig.9 a Final agglomerated condition of cluster centers with p = 3,

b identiﬁed clusters with p = 3

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

11 1111

1

11

11

1

1

1

1

1

1

1

11

11

1

1

11

1

1

1

1

1

1

1

(b)

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Fig.11 a Final agglomerated condition of cluster centers with p = 7,

b identiﬁed clusters with p = 7

On mean shift-based clustering for circular data

123

could get good clustering results.However,if the value of

p becomes too large,the clustering results will become

worse.For example,when p = 41,the ﬁnal agglomerated

cluster centers become four so that the clustering results

become very worse as shown in Fig.14.

We had mentioned that,in this paper,we extend Chang-

Chien et al.(2010),in which they considered the kernel

function K

p

ðhÞ ¼ ð1

1cosðhh

j

Þ

2

Þ

p

¼ ð

1þcosðhh

j

Þ

2

Þ

p

by

dividing 2,to a more general way by choosing b as a

normalized parameter with b ¼ ð2ln

RÞ

1=2

and

R ¼

jj

P

n

j¼1

ðcos h

j

;sin h

j

Þ=njj:We next give an example to

show that using b can get better accuracy than using 2.

Example 3 We generate a data set of the size 150 from a

mixture of two von Mises distributions 0.4VM(p/3,

12.8)?0.6VM(5p/6,10.3).We perform the two versions

of MSBC with K

p

ðhÞ ¼ ð1

1cosðhh

j

Þ

2

Þ

p

and

for the data set.The estimating results for the stabilization

parameter p are shown in Fig.15 using the graphical

method of correlation comparisons to the data set with the

normalized parameter b and 2,respectively.We ﬁnd that

they obtain different estimated values of p.One is p = 8

and another one is p = 17.For comparing the squared error

(SE) and MSE for the parameter m = (m

1

,m

2

),we generate

100 sample sets of the size 150 from the mixture of two

von Mises distributions 0.4VM(p/3,12.8)?0.6VM(5p/6,

10.3).We implement the MSBC with the normalized

parameters b and 2 for these 20 sample sets.We consider

the SE for m with each sample set,deﬁned as

SE =

ðm

1

p=3Þ

2

þðm

2

5p=6Þ

2

2

:We also consider MSE for m with

the generated 100 sample sets,deﬁned as

0

1

2

3

4

5

6

0

2

4

6

8

10

12

14

16

SE

frequency

β

from data

β

= 2

Fig.16 Histogramof SEs for different normalized parameters b and 2

0

20

40

60

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

correlation

normalized parameter =

β

0

20

40

60

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

1

correlation

normalized parameter = 2

P = 8

P = 17

Fig.15 Graph of correlation comparisons with different normalized

parameter values

Table 3 MSE,mean,and standard deviation for different normalized

parameters

Normalized parameter b 2

MSE 2.32 2.45

Mean of 100 estimates for m (60.38,150.07) (60.56,149.97)

Standard deviation (1.51,1.50) (1.56,1.52)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

11 1111

1

11

11

1

1

111

11

1

11

1

1

1

1

(b)

2

2

2

2

2

2

3

3

4

4

4

4

44

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

44

4

4

4

Fig.14 a Final agglomerated condition of cluster centers with

p = 41,b identiﬁed clusters with p = 41

K

p

ðhÞ ¼

ð1

1cosðhh

j

Þ

b

Þ

p

¼ ð

bð1cosðhh

j

ÞÞ

b

Þ

p

;if 1 cosðh h

j

Þ b;

0;if 1 cosðh h

j

Þ [b

(

S.-J.Chang-Chien et al.

123

MSE =

sum of SE of each data set for m

100

:Figure 16 shows the

histogram of 100 SEs with different normalized parameter

b and 2.The blue (or left) bars represent SEs for different

normalized parameters b and the red (or right) bars rep-

resent SEs for the normalized parameter b = 2.From

Fig.16,we can see that the distributions of blue and red are

similar but red has more frequencies on larger SE values.It

means that the normalized parameter with b is better than

the normalized parameter with 2.Table 3 shows the MSEs,

and mean values and standard deviations for different

normalized parameters b and 2.From Table 3,we see that

the MSBC with the normalized parameters b actually has

lower MSE and standard deviation than that with the nor-

malized parameters 2.

In next section,we are going to compare the MSBC

(blurring) algorithm to the EM and FCD clustering algo-

rithms for circular data.

4 Examples and comparisons

We make comparisons of the proposed MSBC (blurring)

algorithm with EM (Spurr and Koutbeiy 1991) and FCD

(Yang and Pan 1997) clustering algorithms for circular

data.We ﬁrst use the data sets from the mixture of von

Mises distributions and then a real circular data set of 76

turtles after laying eggs given by Stephens (1969).We need

to mention that the EM and FCD for circular data are

necessary to assign a cluster number,but our proposed

MSBC algorithm is not necessary to give a priori cluster

number.

Example 4 Consider the simulation data from the mixture

of von Mises distributions 0.4VM(2p/3,8.6)?0.6VM(4p/3,

10.3).We generate a data set of 200 data points from this

mixture distribution.The graphical plot of the correlation

comparisons is shown in Fig.17.From Fig.17,p = 8 is

a good estimate for the stabilization parameter p.We

are interested in the parameter m = (m

1

,m

2

).We con-

sider the squared error (SE) for m.It is deﬁned as

SE =

ðm

1

2p=3Þ

2

þðm

2

4p=3Þ

2

2

:We apply the three algorithms of

MSBC,EMand FCD to the data set and then calculate the

SE for m.The results are shown in Table 4.We ﬁnd that

the SE for m from the MSBC has the smallest values in the

three algorithms,i.e.the MSBC has better accuracy than

the other two.Figure 18 shows the agglomerated cluster

centers of the MSBC.In Fig.18a,the initial values of the

cluster centers are all data points.In Fig.18b–d,we can see

Table 4 Estimates of parameters from algorithms and SE for m

Algorithms MSBC EM FCD

m

1

117.86 118.28 118.25

m

2

241.05 242.09 242.11

j

1

8.61 8.61 8.61

j

2

10.27 10.27 10.27

a

1

0.39 0.39 0.39

a

2

0.61 0.61 0.61

SE for m 2.83 3.66 3.77

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

iteration = 0

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(b)

iteration = 1

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(c)

iteration = 2

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(d)

iteration = 3

ν

Fig.18 Agglomerated cluster centers for iteration = 0,1,2,3

0

5

10

15

20

25

30

35

40

45

50

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

suitable estimate

Fig.17 Graph of correlation comparisons with the estimate p = 8

199

200

198

197

196

195

194

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

1

3

11

12

19

23

2

4

5

7

13

8

9

15

16

17

20

21

14

25

26

27

28

30

31

32

6

29

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

33

64

65

66

10

18

68

70

71

72

73

74

75

76

77

78

69

24

22

63

67

0

0.5

1

1.5

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

(b)

2

2

2

2

2

22

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

22

22

2

2

2

2

2

2

2

222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2 2

22

2

2

2

2

2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

22

2

2

Fig.19 a Hierarchical tree for MSBC data set,b identiﬁed clusters

On mean shift-based clustering for circular data

123

that the cluster centers are agglomerated to two points

when the iterations are increasing.The data set is then well

divided into two clusters.It is the advantage of the MSBC

to identify cluster number with c = 2 without giving

a priori cluster number,but it is necessary to give a priori

c = 2 for the EM and FCD.We use the AHC to ﬁnd the

appropriate clusters with identiﬁed cluster points.The

results are shown in Fig.19 where the X-coordinate rep-

resents individual cluster centers and the Y-coordinate

represents distances between clusters.According to the

hierarchical tree,we can actually identify those two

clusters as shown in Fig.19.To evaluate the above

three algorithms according to the criterion of mean

square error (MSE) for m,we generate 100 data sets from

the mixture of von Mises distribution 0.4VM(2p/3,8.6)?

0.6VM(4p/3,10.3).The MSE is deﬁned as MSE =

sum of SE of each data set for m

100

:Table 5 shows the MSEs,and

mean values and standard deviations for the three algo-

rithms.From Table 5,we ﬁnd that the MSE values for the

EM and the FCD are almost the same and the MSE values

from the MSBC are the smallest among them.Similar

situation also occurs in standard deviation,i.e.the MSBC

has better accuracy than the EM and the FCD.

Example 5 In this example,we consider the inﬂuence of

outliers on the algorithms MSBC,EM and FCD.A better

algorithm should have smaller inﬂuence by outliers.The

data set of Example 4,called the original data set,is used in

this example.We now add the outlier with 0 to the ori-

ginal data set and call it as the outlier data set.The

graphical plot of the correlation comparisons for the outlier

data set is shown in Fig.20.To compare Fig.20 with

Fig.17,the shape of curve is similar.But we ﬁnd that

p = 9 becomes to be a good estimate for the stabilization

parameter p in Fig.20.The agglomerated cluster centers

from the MSBC are shown in Fig.21 where the cluster

centers are agglomerated to three points.The clustering

results from the EM and FCD are shown in Fig.22a and

the clustering results from the MSBC are shown in

Fig.22b.We ﬁnd that the EM and FCD group the outlier

-1

0

1

-1

0

1

(a)

iteration = 0

ν

-1

0

1

-1

0

1

(b)

iteration = 1

ν

-1

0

1

-1

0

1

(c)

iteration = 2

ν

-1

0

1

-1

0

1

(d)

iteration = 3

ν

Fig.21 Agglomerated cluster centers for iteration = 0,1,2,3

-1

0

1

-1

0

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

(a)

2

2

2

2

2

22

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

22222

2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2 2

22

2

2

2

2

2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

22

2

2

199

200

198

197

79

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

80

201

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

0

0.5

1

1.5

(c)

-1

0

1

-1

0

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

(b)

2

2

2

2

2

22

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2222 2

2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2 2

22

2

2

2

2

2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

22

2

2

3

Fig.22 a Classiﬁed results of EM and FCD for the outlier data set,

b classiﬁed results of MSBC for the outlier data set,c hierarchical

cluster tree for MSBC data set

0

5

10

15

20

25

30

35

40

45

50

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

1.005

suitable estimate

Fig.20 Graph of correlation comparisons with the estimated p = 9

Table 5 MSE,mean,and

standard deviation of 100 data

sets for m

Algorithms MSBC EM FCD

MSE 1.845 2.239 2.219

Mean for m (119.882,240.262) (119.797,240.360) (119.773,240.370)

Standard deviation for m (1.406,1.291) (1.546,1.401) (1.535,1.392)

S.-J.Chang-Chien et al.

123

data set into two clusters.But,the MSBC groups the outlier

data set into three clusters with the outlying point simply in

one cluster and the other two clusters still keep the same as

shown in Fig.19b in Example 4.That is,the clustering

results from the EM and FCD are inﬂuenced by the out-

lying point,but the MSBC is not.

We also consider comparing the MSEs of m and mean

values and standard deviations for the outlier data set.The

results are shown in Table 6.We ﬁnd that the MSBC has

the smallest MSE and standard deviation among the three

algorithms.Furthermore,we consider the increasing rate of

MSEs and the removed degrees of mean values and also

the increasing rate of standard deviations due to this out-

lier.We add the outlier of 0 to each generated data set of

Example 4 and compute MSEs of 100 data sets (each has

201 data points) for m.The increasing rate of MSEs,the

removed degrees of mean values and the increasing rate of

standard deviations due to this outlier are deﬁned as

follows:

Increasing rate of MSE for m

¼

MSE

ðthe outlier data setÞ

MSE

ðthe original data setÞ

MSE

ðthe original data setÞ

100%:

Removed degrees of mean for m = jmean

ðthe outlier data setÞ

mean

ðthe original data setÞ

j

Increasing rate of standard deviation

¼

standard deviation

ðthe outlier data setÞ

standard deviation

ðthe original data setÞ

standard deviation

ðthe original data setÞ

100%:

These results are shown in Table 7.From Table 7,we

ﬁnd that the increasing rates and removed degrees of the

MSBC are the smallest among the three algorithms.It

means the MSBC algorithm is robust to the outlying point.

But the EM and FCD are affected by outlier.

Example 6 In this example,we simulate 300 data points

from a mixture of three von Mises distribution 0.25VM

(p/3,10.3)?0.35VM(p,8.6)?0.4VM(5p/3,12.8).We

analyze this data set by the MSBC.According to the

graphical method of correlation comparisons as shown in

Fig.23,p = 15 is a suitable estimate.The agglomerating

state of cluster centers fromthe MSBC is shown in Fig.24.

The initial cluster centers are all data points as shown in

Fig.24b.From Fig.24a–e,the cluster centers are

agglomerated to the three points when the iteration is

increasing.The hierarchical tree and identiﬁed clusters are

shown in Fig.25.It shows that the optimal cluster number

is 3.We also compare the MSEs for the MSBC,EM and

FCD.We generate 100 data sets from the mixture distri-

bution 0.25VM(p/3,10.3)?0.35VM(p,8.6)?0.4VM

(5p/3,12.8) and then compute the MSEs for m,and mean

values and standard deviations of 100 estimates for m.The

results are shown in Table 8.FromTable 8,we see that the

MSBC performs the best.

Example 7 In this example,we consider a real data set.

Table 9 shows the directions of 76 turtles after laying eggs

given by Stephens (1969).In Fig.26,circular histogram

shows two modes for the turtle data (one is about 60,

another is about 240).Spurr and Koutbeiy (1991) con-

sidered this data set generated from a mixture of two von

Table 7 Increasing rate of MSE,removed degree of mean and

increasing rate of standard deviation

Algorithms MSBC EM FCD

Increasing rate of MSE 0.054% 10.005% 12.032%

Removed degree of mean for m

1

0.003 0.358 0.26

Removed degree of mean for m

2

0.002 0.191 0.24

Increasing rate of standard deviation

for m

1

0% 2.264% 2.215%

Increasing rate of standard deviation

for m

2

0.078% 1.784% 2.216%

0

5

10

15

20

25

30

35

40

45

50

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

1

suitable estimate

Fig.23 Graph of correlation comparisons with the estimated p = 9

Table 6 MSE,mean,and

standard deviation of 100 data

sets for m

Algorithms MSBC EM FCD

MSE 1.846 2.553 2.486

Mean for m (119.885,240.260) (119.439,240.551) (119.513,240.610)

Standard deviation for m (1.406,1.292) (1.581,1.426) (1.569,1.395)

On mean shift-based clustering for circular data

123

Mises distributions.They used six moment equations to

solve the estimate of ﬁve parameters m

1

,m

2

,j

1

,j

2

and a

1

.

We use the MSBC to the turtle data without giving a priori

cluster number and then compare it with the EM and FCD

with giving the cluster number c = 2 and also the method

proposed by Spurr and Koutbeiy (1991).The graphical plot

of the correlation comparisons is shown in Fig.27.We ﬁnd

p = 9 is a suitable estimate.Figure 28 shows the MSBC

procedure for the turtle data.We can observe all cluster

centers ﬁnally agglomerated to two points after the MSBC

procedure.Figure 29a shows the hierarchical tree of the

turtle data.Clearly,the number of clusters for the turtle

data should be 2 by the MSBC without a priori cluster

number.We ﬁnd that the clustering results from MSBC,

EM,FCD and Spurr and Koutbeiy’s methods are almost

identical as shown in Fig.29b where the large cluster

moves around 63 and the smaller cluster moves around

241.However,our MSBC is free to a cluster number,but

EM,FCD and Spurr and Koutbeiy’s methods are necessary

to give a priori cluster number c = 2.Furthermore,

Table 10 shows the estimate of all parameters for the

MSBC,EM and FCD algorithms.From Table 10,we ﬁnd

that the estimates of m and a from the three algorithms are

closed to the results analyzed by Spurr and Koutbeiy

(1991),but the estimates of concentration parameter j for

the three algorithms are different.

Example 8 In this example,we consider the SIDS data

given by Mooney et al.(2003).They collected the SIDS

cases in the UK by month of the death for the years

1983–1998.According to their analysis,the SIDS data for

the year 1998 could be ﬁtted by a mixture of two von

Misses distributions so that we may apply the MSBC to the

SIDS data for the year 1998.In Mooney et al.(2003),they

had discussed the property of the SIDS data and then

considered to modify all of 12 months to 31 days,i.e.

before models were ﬁtted,the SIDS data were month-

corrected to 31 days by multiplying February (i.e.,

28 days) by 1.097 and the 30-day months by 1.033.Thus,

the corrected numbers of SIDS cases for the year 1998 are

shown in Table 11 (also see Table 1 in Mooney et al.

2003).In general,the number of cases in a year can be

converted to angles for ﬁtting von Misses distribution (see

Mardia and Jupp 2000).Hence,we convert the corrected

numbers of SIDS cases for the year 1998 in Table 11 to

circular data by dividing 360 into 12,i.e.the 12 months of

1 year could divide the angular range of a circle (0,360)

into 12 intervals.January corresponds to the interval (0,

30),February corresponds to the interval (30,60),and

so on.According to Table 11,we generate 40 random

numbers from the uniform distribution on the interval (0,

30),31 random numbers from the uniform distribution on

the interval (30,60),and so on.Thus,we have a circular

data of 402 data points.We apply the MSBC to this data

set.The graphical plot of the correlation comparisons is

-1

0

1

-1

0

1

(a)

data set

θ

-1

0

1

-1

0

1

(b)

iteration = 0

ν

-1

0

1

-1

0

1

(c)

iteration = 1

ν

-1

0

1

-1

0

1

(d)

iteration = 2

ν

-1

0

1

-1

0

1

(e)

iteration = 3

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(f)

data set and cluster centers(final)

θ

ν

Fig.24 a Three cluster data set,b–e states of cluster centers for

iteration = 0,1,2,3,f data set and cluster centers after convergence

Table 8 MSE,mean,and

standard deviations of 100 data

sets for m

Algorithms MSBC EM FCD

MSE 1.79 1.96 1.97

Mean for m (59.72,180.09,300.05) (59.45,180.04,300.16) (59.42,180.06,300.17)

Standard deviation for m (1.53,1.31,1.12) (1.56,1.39,1.11) (1.57,1.39,1.10)

299

300

298

297

296

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

112

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

113

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

114

0

0.5

1

1.5

(a)

-1

0

1

-1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

(b)

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

33

3

3

3

3

3

3

33

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

33

3

33

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

Fig.25 a Hierarchical tree of data set,b identiﬁed clusters

S.-J.Chang-Chien et al.

123

shown in Fig.30,where p = 9 is a suitable estimate of

p.Figure 31 shows the MSBC procedure for the data set.

We can see that all cluster centers agglomerated to two

points.Figure 32 shows the hierarchical tree and identiﬁed

clusters.From Fig.32a,the optimal cluster number is 2.It

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

data set

θ

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(b)

iteration = 0

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(c)

iteration = 1

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(d)

iteration = 3

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(e)

iteration = 5

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(f)

data set and cluster centers (final)

θ

ν

Fig.28 a Data set for the turtle data,b–e states of cluster centers for

iteration = 0,1,3,5,f data set and cluster centers after convergence

Table 9 Directions of 76 turtles after laying eggs

8 9 13 13 14 18 22 27 30 34

38 38 40 44 45 47 48 48 48 48

50 53 56 57 58 58 61 63 64 64

64 65 65 68 70 73 78 78 78 83

83 88 88 88 90 92 92 93 95 96

98 100 103 106 113 118 138 153 153 155

204 215 223 226 237 238 243 244 250 251

257 268 285 319 343 350

5

10

15

30

210

60

240

90

270

120

300

150

330

180

0

Fig.26 Circular histogram for turtle data

0

5

10

15

20

25

30

35

40

45

50

0.98

0.985

0.99

0.995

1

1.005

suitable estimate

Fig.27 Graph of correlation comparisons with the estimated p = 9

75

76

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

74

61

73

62

64

65

66

67

68

69

70

71

72

63

0

0.5

1

1.5

2

(a)

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

1

1

11

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

11

1

1

1

11

11

1

1

1

1

11

11111

1

1

1

1

11

1

1

11

111

1111

1

1

1

(b)

2

2

2

2

22

2

2

22

2

2

2

Fig.29 a Hierarchical tree for the turtle data,b identiﬁed clusters for

the turtle data

On mean shift-based clustering for circular data

123

has the same indications to the analysis of Mooney et al.

(2003).According to the hierarchical tree,we obtain the

clustering results as shown in Fig.32b.The estimate of two

cluster centers m

1

and m

2

is obtained with 151.44 and

337.50.Hence,we have a conclusion that there are two

peaks of SIDS cases for the year 1998.One is a border

between May and June.Another is a border between

November and December.

Finally,we compare the MSBC with Kobayashi and

Otsu’s (2010) method.Kobayashi and Otsu (2010) con-

structed their von Mises-Fisher mean shift by generalizing

the von Mises-Fisher distribution to be F(x;m,j) =

C

F

(j)f(x

0

m;j),where f is a monotonically increasing con-

vex function and g is a derivative of f.However,in prac-

tical applications,we need to specify the functions f and g.

In their experiments,Kobayashi and Otsu (2010) used the

following kernel function f(x;j) with its derivative g:

f ðx;jÞ ¼

0 0x j

1

2

ðx jÞ jx 1

and

gðx;jÞ ¼

0 0x j

x j jx1

:

In the next example,we use these speciﬁed functions to

make the comparisons of the MSBC with Kobayashi and

Otsu’s (2010) method.

Example 9 For comparing the MSBC and Kobayashi and

Otsu’s (2010) method,we evaluate the MSEs to see the

performance for the two algorithms.Each MSE is evalu-

ated using 100 data sets.We consider the following mix-

ture distributions:

A:0:3WNð0:5p;0:95Þ þ0:7WNðp;0:96Þ in Example 1:

B:0:3WCð0:5p;0:91Þ þ0:7WCðp;0:925Þ in Example 1:

C:0:4VMð2p=3;8:6Þ þ0:6VMð4p=3;10:3Þ in Example 4:

D:0:4VMð2p=3;8:6Þ þ0:6VMð4p=3;10:3Þ by adding

an outlier with 0

in Example 5:

E:0:25VMðp=3;10:3Þ þ0:35VMðp;8:6Þ þ0:4VM

ð5p=3;12:8Þ in Example 6:

The MSEs for the two algorithms on these ﬁve various

mixture distributions are shown in Table 12.From

Table 12,we can see that the MSEs for the two

algorithms are quite closed for the mixtures of von Mises

distributions as shown for the mixture distributions C,D

and E.However,the MSBC algorithm has quite smaller

MSEs than Kobayashi and Otsu’s (2010) method on the

mixture of wrapped normal distributions and the mixture of

wrapped Cauchy distributions as shown for the mixture

distributions A and B.The chosen function g by Kobayashi

and Otsu (2010) in their experiments is actually

Table 10 Estimate of

parameters for turtle data

Algorithms MSBC EM FCD Spurr and Koutbeiy

m

1

63.15 63.47 63.50 63.2

m

2

241.13 241.20 241.25 240.2

j

1

2.75 2.65 2.65 2.91

j

2

7.43 8.61 7.43 4.81

a

1

0.83 0.84 0.84 0.82

a

2

0.17 0.16 0.16 0.18

0

5

10

15

20

25

30

35

40

45

50

0.98

0.985

0.99

0.995

1

1.005

suitable estimate

Fig.30 Graph of correlation comparisons with the estimated p = 9

Table 11 Corrected number of SIDS cases for the year 1998

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total

1998 40 31 25 26 29 33 25 20 27 40 43 63 402

S.-J.Chang-Chien et al.

123

appropriated for the data from the mixtures of von Mises-

Fisher distributions.In general,for a different mixture of

distributions,a different function g is necessary to be

chosen.However,it is difﬁcult to ﬁnd an appropriate g for

Kobayashi and Otsu’s (2010) method in real applications.

Since the MSBC is a nonparametric approach that does not

depend on mixture distributions,it should be more

applicable for real data sets.This becomes an advantage

for the MSBC.Furthermore,we can also evaluate the

increasing rate of MSEs to see the inﬂuence of outliers with

increasing rate ¼

MSEðDÞMSEðCÞ

MSEðCÞ

100%:We ﬁnd that the

increasing rate for the MSBC is 0.054%,but the increasing

rate for Kobayashi and Otsu is 0.64%.It is obvious that an

outlier has smaller inﬂuence on the MSBC than Kobayashi

and Otsu’s (2010) method.

5 Conclusions

In this paper,we proposed a MSBC algorithm for circular

data which is a cluster number free,i.e.the MSBC algo-

rithm can automatically ﬁnd a cluster number with good

cluster centers without giving a priori cluster number.

There are three implementation procedures for the MSBC

are compared.The MSBC with the blurring mean shift is

recommended.The proposed clustering algorithm gives us

a new way in the analysis of grouped circular data in which

it presents the robustness to outlying points.It can also give

us an optimal cluster number estimate.For comparisons,

we used the data sets from mixtures of von Mises distri-

butions,wrapped normal distributions and wrapped

Cauchy distributions,a real circular data set of 76 turtles

after laying eggs and also the SIDS data set to compare the

proposed method with two previous methods,EM and

FCD.The results show the accuracy and superiority of our

proposed algorithmso that it is recommended as a new tool

for the analysis of grouped circular data.We also compared

the proposed MSBC algorithm with Kobayashi and Otsu’s

(2010) method.In this paper,we had deﬁned a good dis-

tance measure between two circular data.However,to

deﬁne a good distance measure between directional data on

the hypersphere may become another challenge.To pro-

pose a nonparametric clustering approach for directional

data on the hypersphere will be our further research topic.

Acknowledgments The authors would like to thank the anonymous

referees for their helpful comments in improving the presentation of

this paper.This work was supported in part by the National Science

Council of Taiwan,under Grant NSC-99-2118-M-033-004-MY2.

References

Banerjee A,Dhillon IS,Ghosh J,Sra S (2005) Clustering on the unit

hypersphere using von Mises-Fisher distributions.J Mach Learn

Res 6:1345–1382

Bartels R (1984) Estimation in a bidirectional mixture of von Mises

distributions.Biometrics 40:777–784

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(a)

data set

θ

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(b)

iteration = 0

ν

-1

0

1

-1

0

1

(c)

iteration = 3

ν

-1

0

1

-1

0

1

(d)

iteration = 6

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(e)

iteration = 9

ν

-1

0

1

-1.5

-1

-0.5

0

0.5

1

1.5

(f)

data set and cluster centers (final)

θ

ν

Fig.31 a Data set in Table 12,b–e states of cluster centers for

iteration = 0,3,6,9,f data set and cluster centers after convergence

Table 12 MSEs for the MSBC and Kobayashi and Otsu’s method

Distribution MSBC Kobayashi and Otsu

A (Ex.1) 2.12 3.11

B (Ex.1) 2.88 4.46

C (Ex.4) 1.845 1.867

D (Ex.5) 1.846 1.879

E (Ex.6) 1.79 1.84

401

402

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

400

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

211

213

214

216

217

218

220

221

222

223

224

225

399

227

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

228

76

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

212

215

219

226

0

0.5

1

1.5

2

(a)

-1

0

1

-1

0

1

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

(b)

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2222

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

22

2

2

2

2

2

2

22

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

222

2

2

22

2

2

2

2

2

2

2

22

2

2

22

222

2

2

2

22

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

Fig.32 a Hierarchical tree of data set in Table 11,b identiﬁed

clusters

On mean shift-based clustering for circular data

123

Carta C,Bueno JA,Ramirez P (2008) Statistical modeling of

directional wind speeds using mixtures of von Mises distribu-

tions.Energy Conserv Manage 49:897–907

Chang-Chien SJ,Yang MS,Hung WL (2010) Mean shift-based

clustering for directional data.In:Proceedings of third interna-

tional workshop on advanced computational intelligence,

pp 367–372

Cheng Y (1995) Mean shift,mode seeking,and clustering.IEEE

Trans Pattern Anal Mach Intell 17:790–799

Comaniciu D,Meer P (2002) Mean shift:a robust approach toward

feature space analysis.IEEE Trans Pattern Anal Mach Intell 24:

603–619

Corcoran J,Chhetri P,Stimson R (2009) Using circular statistics to

explore the geography of the journey to work.Papers Reg Sci

88:119–132

Cristianini N,Shawe-Taylor J (2000) An introduction to support

vector machines.Cambridge University Press,Cambridge

Dempster AP,Laird NM,Rubin DB (1977) Maximum likelihood

from incomplete data via the EM algorithm (with discussion).

J Roy Stat Soc Ser B 39:1–38

Dortet-Bernadet JL,Wicker N (2008) Model-based clustering on the

unit sphere with an illustration using gene expression proﬁles.

Biostatistics 9(1):66–80

Fashing M,Tomasi C (2005) Mean shift is a bound optimization.

IEEE Trans Pattern Anal Mach Intell 27:471–474

Filippone M,Camastra F,Masulli F,Rovetta S (2008) A survey of

kernel and spectral methods for clustering.Pattern Recogn

41:176–190

Fisher NI (1993) Statistical analysis of circular data.Cambridge

University Press,Cambridge

Fukunaga K,Hostetler LD (1975) The estimation of the gradient of a

density function with applications in pattern recognition.IEEE

Trans Inf Theory 21:32–40

Gao KS,Chia F,Krantz I,Nordin P,Machin D (2006) On the

application of the von Mises distribution and angular regression

methods to investigate the seasonality of disease onset.Stat Med

25:1593–1618

Kobayashi T,Otsu N (2010) Von Mises-Fisher mean shift for

clustering on a hypersphere.In:Proceedings of the 20th

international conference on pattern recognition,ICPR 2010,

pp 2130–2133

Lee A (2010) Circular data,WIREs.Comput Stat 2:477–486

Mardia KV (1972) Statistics of directional data.Academic Press,

London

Mardia KV,Jupp PE (2000) Directional statistics.Wiley,New York

Mardia KV,Sutton TW(1975) On the modes of a mixture of two von

Mises distributions.Biometrika 62:699–701

McGraw T,Vemuri BC,Yezierski B,Mareci T (2006) Von Mises-

ﬁsher mixture model of the diffusion ODF.In:Proceedings of

the 3rd IEEE international symposium on biomedical imaging:

from Nano to Macro,pp 65–68

McLachlan GJ,Basford KE (1988) Mixture models:inference and

applications to clustering.Marcel Dekker,NY

Mooney JA,Helms PJ,Jolliffe IT (2003) Fitting mixtures of von

Mises distributions:a case study involving sudden infant death

syndrome.Comput Stat Data Anal 41:505–513

Scho

¨

lkopf B,Smola A,Mu

¨

ller K (1998) Nonlinear component

analysis as a kernel eigenvalue problem.Neural Comput 10:

1299–1319

Silverman BW (1998) Density estimation for statistics and data

analysis.Chapman and Hall,New York

Spurr BD,Koutbeiy MA (1991) A comparison of various methods for

estimating the parameters in mixtures of von Mises distributions.

Commun Stat Simul Comput 20:725–741

Stephens MA (1969) Techniques for directional data.Tech.Report

#150,Department of Statistics,Stanford University,Stanford

Thang ND,Chen L,Chan CK (2008) Feature reduction using mixture

model of directional distributions.In:Proceedings of the 10th

international conference on control,automation,robotics and

vision,ICARCV,pp 2208–2212

Vapnik VN (1998) Statistical learning theory.Wiley,New York

Von Mises R (1918) Uber die die ‘‘Ganzzahligkeit’’ der Atomgewicht

und verwandte Fragen.Physikal Z 19:490–500

Watson GS,Williams EJ (1956) On the construction of signiﬁcance

tests on the circle and the sphere.Biometrika 43:344–352

Wu KL,Yang MS (2007) Mean shift-based clustering.Pattern

Recogn 40:3035–3052

Yang MS,Pan JA (1997) On fuzzy clustering of directional data.

Fuzzy Sets Syst 91:319–326

S.-J.Chang-Chien et al.

123

## Comments 0

Log in to post a comment