On mean shift-based clustering for circular data

muttchessΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 4 χρόνια και 1 μήνα)

99 εμφανίσεις

ORIGINAL PAPER
On mean shift-based clustering for circular data
Shou-Jen Chang-Chien

Wen-Liang Hung

Miin-Shen Yang

Springer-Verlag 2012
Abstract
Cluster analysis is a useful tool for data anal-
ysis.Clustering methods are used to partition a data set into
clusters such that the data points in the same cluster are the
most similar to each other and the data points in the dif-
ferent clusters are the most dissimilar.The mean shift was
originally used as a kernel-type weighted mean procedure
that had been proposed as a clustering algorithm.However,
most mean shift-based clustering (MSBC) algorithms are
used for numeric data.The circular data that are the
directional data on the plane have been widely used in data
analysis.In this paper,we propose a MSBC algorithm for
circular data.Three types of mean shift implementation
procedures with nonblurring,blurring and general methods
are furthermore compared in which the blurring mean shift
procedure is the best and recommended.The proposed
MSBC for circular data is not necessary to give the number
of cluster.It can automatically find a final cluster number
with good clustering centers.Several numerical examples
and comparisons with some existing clustering methods are
used to demonstrate its effectiveness and superiority of the
proposed method.
Keywords
Circular data

Clustering algorithms

Mean shift

Kernel functions
1 Introduction
Data measured in the form of angles or two-dimensional
orientations are usually called circular data.For example,
wind directions and departure directions of migrating birds
or animals are common circular data.However,some cir-
cular data may be not angles,like frequency of observed
event in each month of a year period,but they could be
translated to corresponding angles of points on the unit
circle.Since von Mises (
1918
) introduced a distribution on
circular data (i.e.directional data on the plane) and Watson
and Williams (
1956
) first investigated inference problems
for the von Mises distribution and gave the construction of
statistical methods on circular data.,the researches on cir-
cular data got increasing concerns.Mardia (
1972
),Fisher
(
1993
) and Mardia and Jupp (
2000
) are good reference
books for the analysis of circular data with applications to
biology,geology,medicine,meteorology,oceanography,
etc.Furthermore,Mooney et al.(
2003
) considered circular
data for a case study involving sudden infant death syn-
drome (SIDS).Gao et al.(
2006
) applied themto investigate
the seasonality of disease onset.Carta et al.(
2008
) studied
statistical modeling of directional wind speeds.Corcoran
et al.(
2009
) used circular statistics to explore the geogra-
phy of the journey to work.Recently,Lee (
2010
) gave a
survey for a range of methods that had been developed over
the last 50 years to handle circular data.
Cluster analysis is a useful tool for data analysis.It is a
method for finding clusters of a data set with most simi-
larity in the same cluster and largest dissimilarity between
different clusters.From the statistical point of view,clus-
tering methods can be generally divided into two catego-
ries.One is a probability model-based approach,another is
a nonparametric approach.A probability model-based
approach is assumed that the data set follows a probability
S.-J.Chang-Chien

M.-S.Yang (
&
)
Department of Applied Mathematics,Chung Yuan Christian
University,Chung-Li 32023,Taiwan
e-mail:msyang@math.cycu.edu.tw
W.-L.Hung
Department of Applied Mathematics,National Hsinchu
University of Education,Hsin-Chu,Taiwan
123
Soft Comput
DOI 10.1007/s00500-012-0802-z
distribution,such as a mixture of densities,so that a mix-
ture likelihood approach to clustering can be used
(McLachlan and Basford 1988).For a mixture model,the
expectation and maximization (EM) algorithm (Dempster
et al.1977) is a most-used estimation algorithm.For a
nonparametric approach,clustering methods may be based
on an objective function of similarity or dissimilarity
measures so that partitional clustering is best used.Most
partitional methods suppose that the data set can be rep-
resented by finite cluster prototypes with their own objec-
tive functions.For a probability model-based approach to
clustering of circular data,the EMalgorithmwas well used
in the analysis of grouped circular data in the literature
(Mardia and Sutton 1975;Bartels 1984;Spurr and
Koutbeiy 1991).Since clustering for directional data is
important,there have many researches in a probability
model-based approach been considered,such as Mooney
et al.(2003),Banerjee et al.(2005),McGraw et al.(2006),
Dortet-Bernadet and Wicker (2008) and Thang et al.
(2008).For a nonparametric approach,Yang and Pan
(1997) proposed a fuzzy clustering algorithm for circular
data.However,these clustering algorithms are sensitive to
initials and outliers.
Kernel-based methods are widely used in many appli-
cations (Vapnik 1998;Cristianini and Shawe-Taylor 2000;
Filippone et al.2008).There are two ways for imple-
menting kernel-based methods along with supervised and
unsupervised learning.One is support vector machine by
transforming the data space into the high-dimensional
feature space with supervised learning (Scho
¨
lkopf et al.
1998;Cristianini and Shawe-Taylor 2000).Another one is
to find a kernel density estimate on the data space and then
search the modes of the estimated density (Silverman
1998).The mean shift procedure first proposed by
Fukunaga and Hostetler (1975) was used as a nonpara-
metric kernel density gradient estimate using a generalized
kernel approach.Cheng (1995) clarified the relationship
between mean shift and optimization by introducing the
concept of shadows.He also showed some peculiar
behaviors of mean shift in cluster analysis with the most-
used Gaussian kernel.Comaniciu and Meer (2002)
introduced the mean shift-based feature space analysis
technique with the quality of outputs only controlled by the
kernel bandwidth,i.e.,the resolution of the analysis,so that
the technique could be easily integrated into complex
vision systems.Furthermore,Wu and Yang (2007) pro-
posed a mean shift-based clustering (MSBC) method as a
clustering tool for numeric data.Recently,Kobayashi and
Otsu (2010) proposed the so-called von Mises-Fisher
means shift clustering for the data on a unit hypersphere
(i.e.,directional data in s-dimensional Euclidean space).
Kobayashi and Otsu (2010) embedded the von Mises-
Fisher distribution to the mean shift algorithm.According
to the previously discussed categories of clustering,Ko-
bayashi and Otsu’s method should be belonged to the
probability model-based approach.Recently,Chang-Chien
et al.(2010) had also considered Wu and Yang’s (2007)
(blurring) MSBC for circular data by defining the distance
measure between two angles for circular data.In this case,
Chang-Chien’s method should be a nonparametric
approach.Although both of Kobayashi and Otsu (2010)
and Chang-Chien et al.(2010) are based on mean shift
algorithm,they adopt different approaches.In this paper,
we extended Chang-Chien et al.(2010) first using a nor-
malized parameter b and then by considering three kinds of
implementation procedures,i.e.nonblurring,blurring and
general mean shifts,for circular data.More extensions and
comparisons with numerical and real examples are also
considered in this paper.The proposed clustering algorithm
gives us a new way in the analysis of grouped directional
data on the plane without giving a priori cluster number.
Those examples and comparisons actually demonstrate its
effectiveness and superiority of our proposed method.
The remainder of this paper is organized as follows.In
Sect.2,we first consider circular data with its distribution
and then review the EM(Bartels 1984;Spurr and Koutbeiy
1991) and fuzzy c-directions (FCD) (Yang and Pan 1997)
algorithms for circular data.In Sect.3,we propose the
MSBC algorithm for circular data.The three kinds of
implementation procedures with nonblurring,blurring and
general mean shifts are also compared.In Sect.4,some
examples with numerical and real data sets and compari-
sons with the existing methods are made.Finally,conclu-
sions are stated in Sect.5.
2 EM and FCD algorithms for circular data
In this section,we first consider circular data with its dis-
tribution and then review the EM (Bartels 1984;Spurr and
Koutbeiy 1991) and FCD (Yang and Pan 1997) algorithms
for circular data.In a two-dimensional Cartesian coordinate
system,the direction h of a vector can be regarded as a unit
vector or a point with (cos h,sin h) on the unit circle with
0 B h B 2p.Angles can be measured indegree.Statisticians
proposed various distributions on circular data.The von
Mises distribution VM(m,j) is a well-known probability
model in circular statistics,just like the normal distribution
in linear statistics.The two parameters m and j represent the
mean direction and concentration of the distribution,
respectively.If j is larger,then the distribution will
more focus on m.The probability density function f(h;m,j) is
of the form f(h;m,j) = (2pI
0
(j))
-1
exp(j cos(h - m)),
0 B h\2p,0 B j\?,where I
0
(j) = (2p)
-1
R
2p
0
exp
ðj cos(h mÞÞ dh is the modified Bessel function of order
S.-J.Chang-Chien et al.
123
zero.Tocluster groupedcircular data,a mixture of vonMises
distributions is generally used as a probability model.
The EM algorithm (Dempster et al.1977;McLachlan
and Basford 1988) is a useful tool to cluster the grouped
data with a mixture model.It can also be applied to circular
data.Suppose the data set X = {x
1
,x
2
,…,x
n
} is a random
sample from a population consists of c subpopulations.Let
x be an observation and the mixture density f ðx;a;/Þ ¼
P
c
i¼1
a
i
f
i
ðx;/
i
Þ;where a
i
is the mixing proportion of the
subpopulation i and f
i
(x;/
i
) is the corresponding density,
i = 1,2,…,c.Let us consider z
1
,z
2
,…,z
c
as the indi-
cator functions such that z
ij
= z
i
(x
j
) = 1,if x
j
arises from
the subpopulation i and z
ij
= 0 if x
j
arises from other
subpopulations,for i = 1,2,…,c and j = 1,2,…,
n.Then,the log-likelihood function base on the data set
{x
1
,…,x
n
,z
1
,…,z
c
} is L
EM
ða;/;x
1
;...;x
n
;z
1
;...;z
c
Þ ¼
P
c
i¼1
P
n
j¼1
z
ij
ðln a
i
f
i
ðx
j
;/
i
ÞÞ:The EMalgorithmconsists of
‘‘Expectation’’ step and ‘‘Maximum’’ step.In the expec-
tation step,the expectation Eðz
ij
jx
j
Þ ¼ a
i
f
i
ðx
j
j/
i
Þ=
P
c
k¼1
a
k
f
k
ðx
j
j/
k
Þ is used to substitute for missing data z
ij
.
In the maximum step,we seek estimates of the parameters
by maximizing E L
EM
ð Þ ¼
P
c
i¼1
P
n
j¼1
Eðz
ij
jx
j
Þðln a
i
f
i
ðx
j
;
/
i
ÞÞ with the constraint
P
c
i¼1
a
i
¼ 1:
We next use above steps for circular data.Suppose that the
data set X = {h
1
,h
2
,…,h
n
} is a random sample from a
mixture of von Mises distributions where the probability
density of an observation h
j
from subpopulation i is
f
i
ðh
j
;m
i
;j
i
Þ ¼ ð2pI
0
ðj
i
ÞÞ
1
expðj
i
cosðh
j
m
i
ÞÞ;0h
j
2
p;0 j
i
\1:Therefore,the Lagrangian of E(L
EM
)
is Lða;m;j;kÞ ¼
P
c
i¼1
P
n
j¼1
Eðz
ij
jh
j
Þ lnða
i
f
i
ðh
j
;m
i
;j
i
ÞÞ k
ð
P
c
i¼1
a
i
 1Þ:We take the first derivatives of L(a,m,j,k)
with respect to all parameters and set them to be equal to
zero.Then,the update equations for estimates of parameters
are obtained as follows:
a
i
¼
P
n
j¼1
z
ij
n
;i ¼ 1;2;...;c ð1Þ
m
i
¼ tan
1
P
n
j¼1
z
ij
sinðh
j
Þ
P
n
j¼1
z
ij
cosðh
j
Þ
!
;i ¼ 1;2;...;c ð2Þ
j
i
¼ A
1
P
n
j¼1
z
ij
cosðh
j
m
i
Þ
P
n
j¼1
z
ij
!
;i ¼ 1;2;...;c;ð3Þ
where A
1
ðxÞ a function that can be computed from
Batschelet’s table (see Fisher 1993).On the other hand,the
missing data z
ij
are estimated with posterior probabilities
z
ij
¼
a
i
f
i
ðh
j
;m
i
;j
i
Þ
P
c
k¼1
a
k
f
k
ðh
j
;m
k
;j
k
Þ
;
i ¼ 1;2;...;c;j ¼ 1;2;...;n:
ð4Þ
Thus,the EM algorithm for directional data can be
summarized as follows (see also Bartels 1984;Spurr and
Koutbeiy 1991):
EM algorithm
Step 1.Fix 2 B c B n and fix e [0.Give initials
z
(0)
= (z
1
(0)
,…,z
c
(0)
) and let s = 1.
Step 2.Compute a
(s)
with z
(s-1)
using (1).
Step 3.Compute m
(s)
with z
(s-1)
using (2).
Step 4.Compute j
(s)
with z
(s-1)
,m
(s)
,and {h
1
,h
2
,…,h
n
}
using (3).
Step 5.Update to z
(s)
with a
(s)
,m
(s)
,j
(s)
,and {h
1
,
h
2
,…,h
n
} using (4).
Step 6.Compute d(z
(s)
,z
(s-1)
) in a convenient metric
norm d.
IF d z
ðsÞ
;z
ðs1Þ
 
\e;STOP
ELSE s ¼ s þ1 and return to Step 2:
Yang and Pan (1997) proposed another useful method
called fuzzy c-directions (FCD) clustering algorithm.
Suppose that the data set X = {h
1
,h
2
,…,h
n
} is a random
sample froma mixture of von Mises distributions.Yang and
Pan (1997) proposed the following objective function
B
m;w
ðl;a;m;jÞ ¼
X
n
j¼1
X
c
i¼1
l
m
i
ðh
j
Þðln2pI
0
ðj
i
Þ
þj
i
cosðh
j
m
i
ÞÞ þw
X
n
j¼1
X
c
i¼1
l
m
i
ðh
j
Þ lna
i
;
where the constraints are
P
c
i¼1
a
i
¼1 and
P
c
i¼1
l
i
ðh
j
Þ ¼1
for j = 1,2,…,n in which l = {l
1
,…,l
c
} is a fuzzy
c-partition that u
ij
= l
i
(h
j
) are in the interval [0,1] with the
value l
i
(h
j
) as the degree of membership of h
j
belonged to
the subpopulation i.The Lagrangian L(l,a,m,j,k,c
1
) of
B
m,w
(l,a,m,j) is Lðl;a;m;j;k;c
1
Þ ¼B
m;w
ðl;a;m;jÞ 
P
n
j¼1
k
j
ð
P
c
i¼1
l
i
ðh
j
Þ 1Þ c
1
ð
P
c
i¼1
a
i
1Þ.We take the
first derivatives of L(a,m,j,k) with respect to all parameters
and set them equal to zero.The update equations for
estimates of parameters can be obtained as follows:
a
i
¼
P
n
j¼1
l
m
ij
P
c
i¼1
P
n
j¼1
l
m
ij
;i ¼ 1;2;...;c ð5Þ
m
i
¼ tan
1
P
n
j¼1
l
m
ij
sinðh
j
Þ
P
n
j¼1
l
m
ij
cosðh
j
Þ
0
B
B
B
@
1
C
C
C
A
;i ¼ 1;2;...;c ð6Þ
j
i
¼ A
1
P
n
j¼1
l
m
ij
cosðh
j
m
i
Þ
P
n
j¼1
l
m
ij
!
;i ¼ 1;2;...;c ð7Þ
On mean shift-based clustering for circular data
123
l
ij
¼
X
c
t¼1
ðln2pI
0
ðj
i
Þ j
i
cosðh
j
m
i
Þ wlna
i
Þ
1=ðm1Þ
ðln2pI
0
ðj
t
Þ j
t
cosðh
j
m
t
Þ wlna
t
Þ
1=ðm1Þ
!
1
;
i ¼1;2;...;c;j ¼1;2;...;n
ð8Þ
in which A
-1
is a function that can be computed from
Batschelet’s table.Thus,the FCD algorithm(Yang and Pan
1997) can be summarized as follows:
FCD algorithm
Step 1.Fix m C 1,2 B c B n and w[0.Fix any e [0
and give an initial l
(0)
.
Step 2.Compute a
(k)
with a
(k-1)
using (5).
Step 3.Compute m
(k)
with m
(k-1)
using (6).
Step 4.Compute j
(k)
with l
(k-1)
,m
(k)
,and {h
1
,h
2
,…,h
n
}
using (7).
Step 5.Update to l
(k)
with a
(k)
,m
(k)
,j
(k)
,and {h
1
,
h
2
,…,h
n
} using (8).
Step 6.Compute d(l
(k)
,l
(k-1)
) in a convenient metric
norm d.
IF dðl
ðkÞ
;l
ðk1Þ
Þ\e;STOP
ELSE k ¼ k þ1 and return to Step 2:
3 The proposed clustering method for circular data
A statistical kernel-based method to clustering is to find a
kernel density estimate and then search the modes of the
estimated density.Mean shift procedures are methods for
finding the modes of a kernel density estimate.Fukunaga
and Hostetler (1975) first proposed the mean shift proce-
dure based on a nonparametric density function gradient
estimate using a generalized kernel approach.Cheng
(1995) showed the behaviors of mean shift in cluster
analysis with the most-used Gaussian kernel.Fashing and
Tomasi (2005) claimed that mean shift is a bound opti-
mization and equivalent to Newton’s method in the case of
piecewise constant kernels.Wu and Yang (2007) proposed
a MSBC method in which the procedure consists of four
steps:(1) select a kernel,(2) estimate the stabilization
parameter,(3) use the mean shift procedures,and (4)
identify the clusters.The MSBC procedure could be used
with different kernels.In this paper,we will modify the
MSBC for circular data.
Let X = {x
1
,x
2
,…,x
n
} be a data set in an
s-dimensional Euclidean space R
s
.The generalized
Epanechnikov kernel K
E
p
:X?R is defined as
K
p
E
ðxÞ ¼
ð1 jjx x
j
jj
2
=bÞ
p
if jjx x
j
jj
2
b;
0 if jjx x
j
jj
2
[b:

The normalization parameter b is generally set to be the
sample variance.The parameter p is called the stabilization
parameter.The performance of the density estimates
depends on the estimate for p.Bandwidth selection may
affect performance of the density estimate.Wu and Yang
(2007) proposed a technique by assigning a fixed sample
variance to bandwidth which is different to bandwidth
selection.Reference to the generalized Epanechnikov
kernel,we define a kernel and apply MSBC procedure on
the circular data.Let X = {h
1
,h
2
,…,h
n
} be the circular
data.The kernel function K
p
:X?[0,1] is defined as
where p is the stabilization parameter and b is the sample
circular standard deviation on X defined by b ¼
ð2ln

1=2
;where
R ¼ jj
P
n
j¼1
ðcos h
j
;sin h
j
Þ=njj is the
mean resultant length.The distance measure between
the two angles h and h
j
is 1 - cos (h - h
j
).We normalize
the distance measure by dividing the parameter b.We
mention that,in the conference paper,Chang-Chien et al.
(2010) considered the kernel function K
p
ðhÞ ¼
ð1 
1cosðhh
j
Þ
2
Þ
p
¼ ð
1þcosðhh
j
Þ
2
Þ
p
by dividing 2.In this
paper,we give a more general way by choosing b as a
normalized parameter.Wu and Yang (2007) use the
graphical method of correlation comparisons for
estimating p.We also use the same method to find
suitable estimate for p.The suitable p value provides a
good approximate density shape for the data set.In general,
a good operating range for p falls between 1 and 50.The
kernel density estimate for circular data is given by
^
f
K
p
ðhÞ ¼
X
n
j¼1
K
p
ðhÞwðh
j
Þ;
where K
p
(h) is the previously described kernel function and
w(h
j
) is a weight function on the data set.Throughout the
paper,we use the weight w(h
j
) = 1/n,j = 1,2,…,n,i.e.,
each data point has equal weight.These f
^
f
K
p
ðh
i
Þ;i ¼
1;2;...;ng represent the values of the estimated density
shape on data points.Compute the correlation values of
K
p
ðhÞ ¼
1 
1cosðhh
j
Þ
b
 
p
¼
bð1cosðhh
j
ÞÞ
b
 
p
;if 1 cosðh h
j
Þ b
0;if 1 cosðh h
j
Þ [b
(
;
S.-J.Chang-Chien et al.
123
f
^
f
K
p
ðh
i
Þ;i ¼ 1;2;...;ng with pairs (p = 1,p = 2),
(p = 2,p = 3),…,(p = 50,p = 51),etc.The increased
shift of p depends on the data set.For the circular data,the
value 1 is an appropriate value for the increased shift of
p.The graph of [j,the correlation value with (p = j,
p = j?1)] with j = 1,2,…,50 is plotted for finding a
suitable estimate of p.There are variations for the
approximate density shape with the increase of p.When the
correlation value is closed to 1,the approximate density
shape remains steady.If the correlation value for the ith
point is closed to 1 in the graph,then the value i is the
suitable estimate for p.
The MSBC procedure could be used to find the modes of
a kernel density estimate of circular data.We take the first
derivatives of the kernel density estimate
^
f
K
p
ðhÞ and set it
to be zero.Then,we obtain a mode (cluster center) esti-
mate as follows:
m ¼ tan
1

P
n
j¼1
maxðb ð1 cosðm h
j
ÞÞ;0Þ
 
p1
sinðh
j
Þwðh
j
Þ
P
n
j¼1
maxðb ð1 cosðm h
j
ÞÞ;0Þ
 
p1
cosðh
j
Þwðh
j
Þ
!
:
The MSBC has three kinds of implementation
procedures.They are nonblurring,blurring and general
mean shifts.These are described as follows.
(A) The nonblurring mean shift uses all data points as the
initial values of cluster centers (i.e.m
i
(0)
= h
i
,
i = 1,2,…,n) such that all cluster centers agglom-
erate to optimal locations when increasing the
iterations.Hence,its form for mode estimates is as
follows:
v
tþ1ð Þ
i
¼ tan
1

P
n
j¼1
max b  1 cos v
tð Þ
i
h
j
  
;0
  
p1
sin h
j
 
w h
j
 
P
n
j¼1
max b  1 cos v
tð Þ
i
h
j
  
;0
  
p1
cos h
j
 
w h
j
 
0
B
@
1
C
A
;
i ¼ 1;2;...;n;
where t is the iteration number.
(B) The blurring mean shift also use all data points as the
initial values of cluster centers,but update data point
h
j
with m
j
(i.e.h
j
/m
j
),j = 1,2,…,n so that the
density estimate
^
f
K
p
ðhÞ is changed at each iterative.
Hence,its form for mode estimates is as follows:
v
tþ1ð Þ
i
¼ tan
1

P
n
j¼1
max b  1 cos v
tð Þ
i
v
tð Þ
j
  
;0
  
p1
sin v
tð Þ
j
 
w v
tð Þ
j
 
P
n
j¼1
max b  1 cos v
tð Þ
i
v
tð Þ
j
  
;0
  
p1
cos v
tð Þ
j
 
w v
tð Þ
j
 
0
B
@
1
C
A
;
i ¼ 1;2;...;n;
where t is the iteration number.
(C) The general mean shift is to give c initial values
randomly,where c is smaller than n and could be
chosen as
ffiffiffi
n
p
:In general,these c initial values are
randomly picked from the n data points.Hence,its
form for mode estimates is as follows:
v
tþ1ð Þ
i
¼ tan
1

P
n
j¼1
max b  1 cos v
tð Þ
i
h
j
  
;0
  
p1
sin h
j
 
w h
j
 
P
n
j¼1
max b  1 cos v
tð Þ
i
h
j
  
;0
  
p1
cos h
j
 
w h
j
 
0
B
@
1
C
A
;
i ¼ 1;2;...;c;
where t is the iteration number.
Furthermore,to complete the MSBC procedure,
agglomerating cluster centers lead us a way to determine
the number of cluster with well-identified cluster centers
where the Agglomerative Hierarchical Clustering (AHC)
algorithm is used to identify clusters.There are different
linkages in the AHC algorithm.In this paper,we simply
use the single linkage.Thus,the MSBC algorithm for
circular data can be summarized as follows:
MSBC algorithm for circular data
Step 1.Choose the proposed K
p
kernel.
Step 2.Use the graphical method of correlation compar-
isons for estimating p.
Step 3.Choose one of three procedures fromnonblurring,
blurring and general mean shifts.
Step 4.Use AHC algorithm to identify clusters.
We mention that the proposed MSBC algorithm for
circular data is a nonparametric approach.Recently,
Kobayashi and Otsu (2010) proposed the von Mises-Fisher
mean shift for clustering on a unit hypersphere that is a
probability model-based approach.We know that the von
Mises-Fisher distribution has a probability density function
M(x;m,j) = C
M
(j) exp (jx
0
m),where x is an s-dimen-
sional unit vector and m is the mean direction and j is the
concentration parameter and C
M
(j) is the normalization
constant.Since the von Mises-Fisher distribution is based
on a monotonically increasing convex function and an
inner product x
0
m,Kobayashi and Otsu (2010) generalized
the von Mises-Fisher distribution to be F(x;m,j) =
C
F
(j)f(x
0
m;j),where f is a monotonically increasing
convex function with g being a derivative of f.Then,
the kernel density estimate p(x) is constructed as pðxÞ ¼
C
F
ðjÞ
n
P
n
j¼1
f ðx
0
x
j
;jÞ:To maximize p(x) with the constraint
jjxjj ¼ 1,Kobayashi and Otsu (2010) derived the mode
estimate as x ¼
P
n
j¼1
x
j
gðx
0
x
j
;jÞ
jj
P
n
j¼1
x
j
gðx
0
x
j
;jÞjj
that was used in their
Algorithm 1.Suppose that {x
1
,x
2
,…,x
n
} is an
s-dimensional data set fromthe vonMises-Fisher distribution.
On mean shift-based clustering for circular data
123
Intheir experimental results,Kobayashi and Otsu(2010) used
the kernel function f(x;j) and its derivative g as follows:
f ðx;jÞ ¼
0 0x j
1
2
ðx jÞ jx 1

and
gðx;jÞ ¼
0 0 x j
x j jx 1

:
We will use these specified functions to make the
comparisons of Kobayashi and Otsu (2010) and our MSBC
algorithm in the next section.According to the above
statement,we can find differences between Kobayashi and
Otsu (2010) and our methods.The method proposed by
Kobayashi and Otsu (2010) used the von Mises-Fisher
distribution.Therefore,the kernel function f(x;j) and the
kernel density estimate p(x) and the mode estimates in
Kobayashi and Otsu (2010) are all dependent on the
parameter j.However,in our method,we define the kernel
function K
p
(h) as
where the distance measures between two angles are used
to define K
p
(h).Obviously,our proposed kernel function
K
p
(h) for the MSBC algorithm is independent of the
parameter j.On the other hand,our method can be used for
different distributions,such as a mixture of von Mises
distributions,a mixture of wrapped normal distributions
and a mixture of wrapped Cauchy distributions,etc.We
will demonstrate these in Example 1.Moreover,we also
use the stabilization parameter p to control the number of
peaks (modes) in the data set and then use a correlation
comparison procedure to give a good estimate of p.The
concept of the stabilization parameter p is not used in
Kobayashi and Otsu’s (2010) method.
Since the proposed MSBC algorithm is independent to
the parameter j,if we want to handle those data from a
mixture of von Mises distributions,we need to consider an
estimate for the concentration parameter j.The von Mises
probability density function f(h;m,j) is with the form
f ðh;m;jÞ ¼ ð2pI
0
ðjÞÞ
1
expðjcosðh mÞÞ;
0h\2p;0 j\1;
where I
0
ðjÞ ¼ ð2pÞ
1
R
2p
0
exp(jcos(h mÞÞ dh is the modi-
fied Bessel function of order zero.In this paper,we use
the function A
-1
to estimate j from the Batschelet’s
table (see Fisher 1993).Suppose X
i
is the ith cluster,h
j
is a
data point in X
i
,and m
j
is its corresponding cluster center.
We set
x ¼
P
8h
j
2X
i
cosðh
j
m
j
Þ
the number of data points in X
i
:
Then A
-1
(x) is the estimate of concentration for the ith
cluster in which A(j) is the first derivative of ln I
0
(j) with
respect to j and can be computed from the Batschelet’s
table.
We now compare these three kinds of implementation
procedures using a mixture of von Mises distributions,a
mixture of wrapped normal distributions and a mixture of
wrapped Cauchy distributions.We calculate the mean
squared error (MSE) to compare their accuracy for the
three kinds of implementation procedures.Recall that a
wrapped distribution on a circle is derived from the dis-
tribution on a line.Suppose X is a random variable on a
line with its probability density function (pdf) g(x) and
cumulative distribution function (cdf) G(x).The random
variable H of the wrapped distribution could be obtained
by the transformation H  X½mod2p and its corresponding
cdf F(h) is FðhÞ ¼
P
1
k¼1
½Gðh þ2pkÞ Gð2pkÞ.The
pdf f(h) of H become to be f ðhÞ ¼
P
1
k¼1
gðh þ2pkÞ.
There are many commonly wrapped distributions on a
circle such as the wrapped normal distribution WN(m q)
and the wrapped Cauchy distribution WC(m,q).The
parameters m and q are the mean direction and the mean
resultant length,respectively.The distribution WN(m,q) is
a symmetric unimodal distribution which is obtained by
wrapping the normal distribution on a line,with the pdf
f ðhÞ ¼
1
2p
ð1 þ2
P
1
p¼1
q
p
2
cos pðh mÞÞ;0 h2p;0
q1.The distribution WC(m,q) is a symmetric uni-
modal distribution which is obtained by wrapping the
Cauchy distribution on the line,with the pdf f ðhÞ ¼
1q
2
2pð1þq
2
2qcosðhmÞÞ
;0h2p;0 q1:
Example 1 For comparing the three kinds of implemen-
tation procedures,nonblurring,blurring and general mean
shifts,we consider the simulation data from the mixture of
von Mises distributions by generating 120 data points from
0.3VM(0.5p,10.3)?0.7VM(p,12.8).The result with
p = 9 for estimating stabilization parameter p using the
graphical method of correlation comparisons to the data set
is shown in Fig.1.We use the estimated p = 9 for these
three kinds of implementation procedures.Figures 2,3 and
4 show the agglomerated conditions of cluster centers
under the different implementation procedures.When we
K
p
ðhÞ ¼
1 
1cosðhh
j
Þ
b
 
p
¼
bð1cosðhh
j
ÞÞ
b
 
p
;if 1 cosðh h
j
Þ b;
0;if 1 cosðh h
j
Þ [b
(
S.-J.Chang-Chien et al.
123
compare the part (a) in Figs.2,3 and 4,we find that there
are the same initial cluster centers for the blurring and
nonblurring mean shifts,but only 11 initial cluster centers
for the general mean shift.To observe the part (d) in
Figs.2,3 and 4,the cluster centers have been agglomerated
to two points for all three procedures.For the general mean
shift,initial cluster centers are randomly assigned from all
data points.It is possible that all initial cluster centers come
from the same subpopulation.In this situation,the cluster
centers may be agglomerated to one point.It means that
there is only one cluster in this data set.Therefore,we may
have a bad cluster with the general mean shift.Further-
more,the hierarchical trees and identified clusters for dif-
ferent mean shift procedures are shown in Figs.5,6,and 7.
From the part (a) of Figs.5,6 and 7,we find that three
hierarchical trees are different,but they all indicate that the
-1
0
1
-1
0
1
iteration = 0
ν
-1
0
1
-1
0
1
iteration = 2
ν
-1
0
1
-1
0
1
iteration = 3
ν
-1
0
1
-1
0
1
iteration = 4
ν
Fig.2 Agglomerated condition of cluster centers with blurring mean
shift
-1
0
1
-1
0
1
iteration = 0
ν
-1
0
1
-1
0
1
iteration = 3
ν
-1
0
1
-1
0
1
iteration = 6
ν
-1
0
1
-1
0
1
iteration = 10
ν
Fig.3 Agglomerated condition of cluster centers with nonblurring
mean shift
-1
0
1
-1
0
1
iteration = 0
ν
-1
0
1
-1
0
1
iteration = 2
ν
-1
0
1
-1
0
1
iteration = 4
ν
-1
0
1
-1
0
1
iteration = 6
ν
Fig.4 Agglomerated condition of cluster centers with general mean
shift
0
5
10
15
20
25
30
35
40
45
50
0.98
0.985
0.99
0.995
1
1.005
suitable estimate
Fig.1 Graph of correlation comparisons with the estimate p = 9
119
120
118
117
34
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
0
0.2
0.4
0.6
0.8
1
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
1
1
111
1
1
11
1
1
1
1
1
1
1
1
1
1
1
111
1
11
1
1
1
1
1
1
1
(b)
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
Fig.5 a Hierarchical tree with blurring mean shift,b identified
clusters
On mean shift-based clustering for circular data
123
cluster number is 2.From part (b) of Figs.5,6 and 7,the
same clustering results are obtained using the blurring and
general mean shift procedures.
In order to calculate the MSE for the mean direction
m = (m
1
,m
2
).we generate 100 data sets from 0.3VM(0.5p,
10.3)?0.7VM(p,12.8).The MSE is defined as the
average of squared errors (SE) between the true and esti-
mated values for these generated data sets,i.e.,
MSE =
sum of SE of each data set for m
100
:These MSEs of three dif-
ferent procedures are shown in Table 1.We also consider
the mean values and standard deviations for m under these
100 estimates as shown in Table 1.From Table 1,the
blurring mean shift has the smallest MSE among three.
Thus,the blurring mean shift has the best accuracy when a
data set comes from a mixture of von Mises distributions.
Moreover,the means for m under these 100 estimates from
three kinds of implementation procedures are also closed to
the true values of m.However,the standard deviations for m
under these 100 estimates from the blurring mean shift has
the smallest value among them.It means that the blurring
mean shift is more stable than the other two.Overall,the
MSE and standard deviation for m do emphasize that the
blurring mean shift should be better than the nonblurring
and general mean shift procedures.
Similar to the previous comparisons,we now replace the
mixture of von Mises distributions 0.3VM(0.5p,
10.3)?0.7VM(p,12.8) with a mixture of wrapped normal
distributions 0.3WN(0.5p,0.95)?0.7WN(p,0.96) and
also a mixture of wrapped Cauchy distributions
0.3WC(0.5p,0.91)?0.7WC(p,0.925).To generate data
from a mixture of wrapped normal distributions and a
mixture of wrapped Cauchy distributions can refer the
book of Fisher (1993).The MSEs for three different pro-
cedures from the two distributions are shown in Table 2a
and b,respectively.From Table 2a,b,we find that the
blurring mean shift has the smallest MSEs and standard
deviations among three.In general,von Mises distribution,
wrapped normal distribution and wrapped Cauchy distri-
bution are the most-used probability models for circular
data.In our simulation,we find that the blurring mean shift
has the smallest MSEs among the three procedures for data
sets from a mixture of von Mises distribution,a mixture of
wrapped normal distribution and a mixture of wrapped
Cauchy distribution.
As a whole,we could conclude that the blurring mean
shift is recommended as an implementation procedure for
the proposed MSBC algorithm.Thus,we shall choose the
blurring mean shift procedure for analyzing the grouped
circular data.We mention that the graphical method of
correlation comparisons is used for estimating p.For
example,we consider p = 9 as the estimate of the stabil-
ization parameter p using the graphical method of corre-
lation comparisons to the data set as shown in Fig.1.
However,we may be interested to see the impact on
Table 2 MSE,mean and standard deviation for m when the model is
(a) 0.3WN(0.5p,0.95)?0.7WN(p,0.96) and (b) 0.3WC(0.5p,
0.91?0.7WC(p,0.925)
Implementation
procedure
Blurring Nonblurring General
(a)
MSE for m 2.12 2.21 4.50
Mean for m (89.67,180.25) (90.01,180.08) (88.86,180.11)
Standard
deviation
for m
(1.60,1.24) (1.80,1.10) (2.41,1.40)
(b)
MSE for m 2.88 4.48 4.50
Mean for m (88.89,180.17) (88.88,180.10) (88.86,180.11)
Standard
deviation
for m
(1.84,1.08) (2.40,1.41) (2.41,1.40)
Table 1 MSE,mean and standard deviation for m when the model is
0.3VM(0.5p,10.3)?0.7VM(p,12.8)
Implementation
procedure
Blurring Nonblurring General
MSE for m 3.11 3.84 4.14
Mean for m (89.70,179.71) (89.87,179.77) (89.93,179.77)
Standard
deviation
for m
(1.94,1.53) (2.09,1.82) (2.17,1.90)
3
4
6
7
5
8
9
10
2
1
11
0
0.2
0.4
0.6
0.8
1
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
1
1
1111
1
11
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
Fig.7 a Hierarchical tree with general mean shift,b identified
clusters
9
10
1
5
27
3
16
6
23
4
8
20
33
11
19
14
22
21
7
15
25
31
2
32
13
17
12
28
26
18
29
30
24
34
71
89
45
40
53
68
104
88
119
95
81
96
36
84
63
118
72
74
114
54
110
112
103
37
82
43
55
65
77
44
90
105
41
93
79
47
91
85
86
99
115
61
52
60
51
94
98
78
92
102
106
70
108
113
67
73
75
57
62
80
59
42
46
50
64
76
83
38
101
39
58
97
120
49
100
66
56
117
116
35
107
109
111
48
69
87
0
0.2
0.4
0.6
0.8
1
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
11 1111
1
11
11
1
1
1
1
1
1
1
11
11
1
11
1
1
1
1
1
1
1
(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
Fig.6 a Hierarchical tree with nonblurring mean shift,b identified
clusters
S.-J.Chang-Chien et al.
123
clustering results when different values of p are selected.
We next continue Example 1 to have more demonstrations
about impacts of different values of the stabilization
parameter p.
Example 2 In order to show the impact on clustering
results using the graphical method of correlation compar-
isons,we consider different values of the stabilization
parameter p to the generating 120 data points from the
mixture of von Mises distributions 0.3VM(0.5p,
10.3)?0.7VM(p,12.8) used in Example 1.We then
analyze their clustering results by implementing the MSBC
algorithm (only blurring).The clustering results of MSBC
with p = 1 are shown in Fig.8.Figure 8a presents that
cluster centers are agglomerated to one point.Thus,MSBC
with p = 1 cannot cluster the data set into two clusters,but
only one cluster as shown in Fig.8b.The clustering results
of MSBC with p = 3 are shown in Fig.9.Figure 9a pre-
sents that cluster centers are agglomerated to two points.
Thus,MSBC with p = 3 actually cluster the data set into
two clusters,but these clustering results are obviously not
good as shown in Fig.9b.The clustering results of MSBC
with p = 5 as shown in Fig.10 are better than those with
p = 3,but these clustering results are still not good enough
as shown in Fig.10b.We find that the clustering results of
MSBC with p = 7 as shown in Fig.11 have exactly the
same as those with p = 9 as shown in Fig.5b even though
for p = 11 and p = 13 as shown in Figs.12 and 13,
respectively.That signifies,most values of p around p = 9
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
1
1
111
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
11
1
11
1
1
1
1
1
1
1
(b)
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Fig.12 a Final agglomerated condition of cluster centers with
p = 11,b identified clusters with p = 11
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
11 1111
1
11
11
1
1
1
1
1
1
1
11
11
1
11
1
1
1
1
1
1
1
(b)
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Fig.10 a Final agglomerated condition of cluster centers with p = 5,
b identified clusters with p = 5
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
1
(b)
22 2222
2
22
22
2
2
2
2
2
2
2
22
22
2
2
22
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Fig.8 a Final agglomerated condition of cluster centers with p = 1,
b identified clusters with p = 1
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
1
1 1
11
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
11
1
11
1
1
1
1
1
1
1
(b)
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Fig.13 a Final agglomerated condition of cluster centers with
p = 13,b identified clusters with p = 13
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
11 1111
1
11
11
1
1
1
1
1
1
11
11
1
11
1
1
1
1
1
(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Fig.9 a Final agglomerated condition of cluster centers with p = 3,
b identified clusters with p = 3
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
11 1111
1
11
11
1
1
1
1
1
1
1
11
11
1
1
11
1
1
1
1
1
1
1
(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Fig.11 a Final agglomerated condition of cluster centers with p = 7,
b identified clusters with p = 7
On mean shift-based clustering for circular data
123
could get good clustering results.However,if the value of
p becomes too large,the clustering results will become
worse.For example,when p = 41,the final agglomerated
cluster centers become four so that the clustering results
become very worse as shown in Fig.14.
We had mentioned that,in this paper,we extend Chang-
Chien et al.(2010),in which they considered the kernel
function K
p
ðhÞ ¼ ð1 
1cosðhh
j
Þ
2
Þ
p
¼ ð
1þcosðhh
j
Þ
2
Þ
p
by
dividing 2,to a more general way by choosing b as a
normalized parameter with b ¼ ð2ln

1=2
and
R ¼
jj
P
n
j¼1
ðcos h
j
;sin h
j
Þ=njj:We next give an example to
show that using b can get better accuracy than using 2.
Example 3 We generate a data set of the size 150 from a
mixture of two von Mises distributions 0.4VM(p/3,
12.8)?0.6VM(5p/6,10.3).We perform the two versions
of MSBC with K
p
ðhÞ ¼ ð1 
1cosðhh
j
Þ
2
Þ
p
and
for the data set.The estimating results for the stabilization
parameter p are shown in Fig.15 using the graphical
method of correlation comparisons to the data set with the
normalized parameter b and 2,respectively.We find that
they obtain different estimated values of p.One is p = 8
and another one is p = 17.For comparing the squared error
(SE) and MSE for the parameter m = (m
1
,m
2
),we generate
100 sample sets of the size 150 from the mixture of two
von Mises distributions 0.4VM(p/3,12.8)?0.6VM(5p/6,
10.3).We implement the MSBC with the normalized
parameters b and 2 for these 20 sample sets.We consider
the SE for m with each sample set,defined as
SE =
ðm
1
p=3Þ
2
þðm
2
5p=6Þ
2
2
:We also consider MSE for m with
the generated 100 sample sets,defined as
0
1
2
3
4
5
6
0
2
4
6
8
10
12
14
16
SE
frequency
β
from data
β
= 2
Fig.16 Histogramof SEs for different normalized parameters b and 2
0
20
40
60
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
correlation
normalized parameter =
β
0
20
40
60
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
correlation
normalized parameter = 2
P = 8
P = 17
Fig.15 Graph of correlation comparisons with different normalized
parameter values
Table 3 MSE,mean,and standard deviation for different normalized
parameters
Normalized parameter b 2
MSE 2.32 2.45
Mean of 100 estimates for m (60.38,150.07) (60.56,149.97)
Standard deviation (1.51,1.50) (1.56,1.52)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
11 1111
1
11
11
1
1
111
11
1
11
1
1
1
1
(b)
2
2
2
2
2
2
3
3
4
4
4
4
44
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
44
4
4
4
Fig.14 a Final agglomerated condition of cluster centers with
p = 41,b identified clusters with p = 41
K
p
ðhÞ ¼
ð1 
1cosðhh
j
Þ
b
Þ
p
¼ ð
bð1cosðhh
j
ÞÞ
b
Þ
p
;if 1 cosðh h
j
Þ b;
0;if 1 cosðh h
j
Þ [b
(
S.-J.Chang-Chien et al.
123
MSE =
sum of SE of each data set for m
100
:Figure 16 shows the
histogram of 100 SEs with different normalized parameter
b and 2.The blue (or left) bars represent SEs for different
normalized parameters b and the red (or right) bars rep-
resent SEs for the normalized parameter b = 2.From
Fig.16,we can see that the distributions of blue and red are
similar but red has more frequencies on larger SE values.It
means that the normalized parameter with b is better than
the normalized parameter with 2.Table 3 shows the MSEs,
and mean values and standard deviations for different
normalized parameters b and 2.From Table 3,we see that
the MSBC with the normalized parameters b actually has
lower MSE and standard deviation than that with the nor-
malized parameters 2.
In next section,we are going to compare the MSBC
(blurring) algorithm to the EM and FCD clustering algo-
rithms for circular data.
4 Examples and comparisons
We make comparisons of the proposed MSBC (blurring)
algorithm with EM (Spurr and Koutbeiy 1991) and FCD
(Yang and Pan 1997) clustering algorithms for circular
data.We first use the data sets from the mixture of von
Mises distributions and then a real circular data set of 76
turtles after laying eggs given by Stephens (1969).We need
to mention that the EM and FCD for circular data are
necessary to assign a cluster number,but our proposed
MSBC algorithm is not necessary to give a priori cluster
number.
Example 4 Consider the simulation data from the mixture
of von Mises distributions 0.4VM(2p/3,8.6)?0.6VM(4p/3,
10.3).We generate a data set of 200 data points from this
mixture distribution.The graphical plot of the correlation
comparisons is shown in Fig.17.From Fig.17,p = 8 is
a good estimate for the stabilization parameter p.We
are interested in the parameter m = (m
1
,m
2
).We con-
sider the squared error (SE) for m.It is defined as
SE =
ðm
1
2p=3Þ
2
þðm
2
4p=3Þ
2
2
:We apply the three algorithms of
MSBC,EMand FCD to the data set and then calculate the
SE for m.The results are shown in Table 4.We find that
the SE for m from the MSBC has the smallest values in the
three algorithms,i.e.the MSBC has better accuracy than
the other two.Figure 18 shows the agglomerated cluster
centers of the MSBC.In Fig.18a,the initial values of the
cluster centers are all data points.In Fig.18b–d,we can see
Table 4 Estimates of parameters from algorithms and SE for m
Algorithms MSBC EM FCD
m
1
117.86 118.28 118.25
m
2
241.05 242.09 242.11
j
1
8.61 8.61 8.61
j
2
10.27 10.27 10.27
a
1
0.39 0.39 0.39
a
2
0.61 0.61 0.61
SE for m 2.83 3.66 3.77
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
iteration = 0
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(b)
iteration = 1
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(c)
iteration = 2
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(d)
iteration = 3
ν
Fig.18 Agglomerated cluster centers for iteration = 0,1,2,3
0
5
10
15
20
25
30
35
40
45
50
0.955
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
1
suitable estimate
Fig.17 Graph of correlation comparisons with the estimate p = 8
199
200
198
197
196
195
194
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
1
3
11
12
19
23
2
4
5
7
13
8
9
15
16
17
20
21
14
25
26
27
28
30
31
32
6
29
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
33
64
65
66
10
18
68
70
71
72
73
74
75
76
77
78
69
24
22
63
67
0
0.5
1
1.5
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
1
1
1
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
(b)
2
2
2
2
2
22
2
2
2
2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
22
22
2
2
2
2
2
2
2
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2 2
22
2
2
2
2
2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
22
2
2
Fig.19 a Hierarchical tree for MSBC data set,b identified clusters
On mean shift-based clustering for circular data
123
that the cluster centers are agglomerated to two points
when the iterations are increasing.The data set is then well
divided into two clusters.It is the advantage of the MSBC
to identify cluster number with c = 2 without giving
a priori cluster number,but it is necessary to give a priori
c = 2 for the EM and FCD.We use the AHC to find the
appropriate clusters with identified cluster points.The
results are shown in Fig.19 where the X-coordinate rep-
resents individual cluster centers and the Y-coordinate
represents distances between clusters.According to the
hierarchical tree,we can actually identify those two
clusters as shown in Fig.19.To evaluate the above
three algorithms according to the criterion of mean
square error (MSE) for m,we generate 100 data sets from
the mixture of von Mises distribution 0.4VM(2p/3,8.6)?
0.6VM(4p/3,10.3).The MSE is defined as MSE =
sum of SE of each data set for m
100
:Table 5 shows the MSEs,and
mean values and standard deviations for the three algo-
rithms.From Table 5,we find that the MSE values for the
EM and the FCD are almost the same and the MSE values
from the MSBC are the smallest among them.Similar
situation also occurs in standard deviation,i.e.the MSBC
has better accuracy than the EM and the FCD.
Example 5 In this example,we consider the influence of
outliers on the algorithms MSBC,EM and FCD.A better
algorithm should have smaller influence by outliers.The
data set of Example 4,called the original data set,is used in
this example.We now add the outlier with 0 to the ori-
ginal data set and call it as the outlier data set.The
graphical plot of the correlation comparisons for the outlier
data set is shown in Fig.20.To compare Fig.20 with
Fig.17,the shape of curve is similar.But we find that
p = 9 becomes to be a good estimate for the stabilization
parameter p in Fig.20.The agglomerated cluster centers
from the MSBC are shown in Fig.21 where the cluster
centers are agglomerated to three points.The clustering
results from the EM and FCD are shown in Fig.22a and
the clustering results from the MSBC are shown in
Fig.22b.We find that the EM and FCD group the outlier
-1
0
1
-1
0
1
(a)
iteration = 0
ν
-1
0
1
-1
0
1
(b)
iteration = 1
ν
-1
0
1
-1
0
1
(c)
iteration = 2
ν
-1
0
1
-1
0
1
(d)
iteration = 3
ν
Fig.21 Agglomerated cluster centers for iteration = 0,1,2,3
-1
0
1
-1
0
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
(a)
2
2
2
2
2
22
2
2
2
2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
22222
2
2
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2 2
22
2
2
2
2
2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
22
2
2
199
200
198
197
79
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
80
201
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
0
0.5
1
1.5
(c)
-1
0
1
-1
0
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
(b)
2
2
2
2
2
22
2
2
2
2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
2222 2
2
2
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2 2
22
2
2
2
2
2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
22
2
2
3
Fig.22 a Classified results of EM and FCD for the outlier data set,
b classified results of MSBC for the outlier data set,c hierarchical
cluster tree for MSBC data set
0
5
10
15
20
25
30
35
40
45
50
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
1
1.005
suitable estimate
Fig.20 Graph of correlation comparisons with the estimated p = 9
Table 5 MSE,mean,and
standard deviation of 100 data
sets for m
Algorithms MSBC EM FCD
MSE 1.845 2.239 2.219
Mean for m (119.882,240.262) (119.797,240.360) (119.773,240.370)
Standard deviation for m (1.406,1.291) (1.546,1.401) (1.535,1.392)
S.-J.Chang-Chien et al.
123
data set into two clusters.But,the MSBC groups the outlier
data set into three clusters with the outlying point simply in
one cluster and the other two clusters still keep the same as
shown in Fig.19b in Example 4.That is,the clustering
results from the EM and FCD are influenced by the out-
lying point,but the MSBC is not.
We also consider comparing the MSEs of m and mean
values and standard deviations for the outlier data set.The
results are shown in Table 6.We find that the MSBC has
the smallest MSE and standard deviation among the three
algorithms.Furthermore,we consider the increasing rate of
MSEs and the removed degrees of mean values and also
the increasing rate of standard deviations due to this out-
lier.We add the outlier of 0 to each generated data set of
Example 4 and compute MSEs of 100 data sets (each has
201 data points) for m.The increasing rate of MSEs,the
removed degrees of mean values and the increasing rate of
standard deviations due to this outlier are defined as
follows:
Increasing rate of MSE for m
¼
MSE
ðthe outlier data setÞ
MSE
ðthe original data setÞ
MSE
ðthe original data setÞ
100%:
Removed degrees of mean for m = jmean
ðthe outlier data setÞ
 mean
ðthe original data setÞ
j
Increasing rate of standard deviation
¼
standard deviation
ðthe outlier data setÞ
standard deviation
ðthe original data setÞ
standard deviation
ðthe original data setÞ
100%:
These results are shown in Table 7.From Table 7,we
find that the increasing rates and removed degrees of the
MSBC are the smallest among the three algorithms.It
means the MSBC algorithm is robust to the outlying point.
But the EM and FCD are affected by outlier.
Example 6 In this example,we simulate 300 data points
from a mixture of three von Mises distribution 0.25VM
(p/3,10.3)?0.35VM(p,8.6)?0.4VM(5p/3,12.8).We
analyze this data set by the MSBC.According to the
graphical method of correlation comparisons as shown in
Fig.23,p = 15 is a suitable estimate.The agglomerating
state of cluster centers fromthe MSBC is shown in Fig.24.
The initial cluster centers are all data points as shown in
Fig.24b.From Fig.24a–e,the cluster centers are
agglomerated to the three points when the iteration is
increasing.The hierarchical tree and identified clusters are
shown in Fig.25.It shows that the optimal cluster number
is 3.We also compare the MSEs for the MSBC,EM and
FCD.We generate 100 data sets from the mixture distri-
bution 0.25VM(p/3,10.3)?0.35VM(p,8.6)?0.4VM
(5p/3,12.8) and then compute the MSEs for m,and mean
values and standard deviations of 100 estimates for m.The
results are shown in Table 8.FromTable 8,we see that the
MSBC performs the best.
Example 7 In this example,we consider a real data set.
Table 9 shows the directions of 76 turtles after laying eggs
given by Stephens (1969).In Fig.26,circular histogram
shows two modes for the turtle data (one is about 60,
another is about 240).Spurr and Koutbeiy (1991) con-
sidered this data set generated from a mixture of two von
Table 7 Increasing rate of MSE,removed degree of mean and
increasing rate of standard deviation
Algorithms MSBC EM FCD
Increasing rate of MSE 0.054% 10.005% 12.032%
Removed degree of mean for m
1
0.003 0.358 0.26
Removed degree of mean for m
2
0.002 0.191 0.24
Increasing rate of standard deviation
for m
1
0% 2.264% 2.215%
Increasing rate of standard deviation
for m
2
0.078% 1.784% 2.216%
0
5
10
15
20
25
30
35
40
45
50
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
suitable estimate
Fig.23 Graph of correlation comparisons with the estimated p = 9
Table 6 MSE,mean,and
standard deviation of 100 data
sets for m
Algorithms MSBC EM FCD
MSE 1.846 2.553 2.486
Mean for m (119.885,240.260) (119.439,240.551) (119.513,240.610)
Standard deviation for m (1.406,1.292) (1.581,1.426) (1.569,1.395)
On mean shift-based clustering for circular data
123
Mises distributions.They used six moment equations to
solve the estimate of five parameters m
1
,m
2
,j
1
,j
2
and a
1
.
We use the MSBC to the turtle data without giving a priori
cluster number and then compare it with the EM and FCD
with giving the cluster number c = 2 and also the method
proposed by Spurr and Koutbeiy (1991).The graphical plot
of the correlation comparisons is shown in Fig.27.We find
p = 9 is a suitable estimate.Figure 28 shows the MSBC
procedure for the turtle data.We can observe all cluster
centers finally agglomerated to two points after the MSBC
procedure.Figure 29a shows the hierarchical tree of the
turtle data.Clearly,the number of clusters for the turtle
data should be 2 by the MSBC without a priori cluster
number.We find that the clustering results from MSBC,
EM,FCD and Spurr and Koutbeiy’s methods are almost
identical as shown in Fig.29b where the large cluster
moves around 63 and the smaller cluster moves around
241.However,our MSBC is free to a cluster number,but
EM,FCD and Spurr and Koutbeiy’s methods are necessary
to give a priori cluster number c = 2.Furthermore,
Table 10 shows the estimate of all parameters for the
MSBC,EM and FCD algorithms.From Table 10,we find
that the estimates of m and a from the three algorithms are
closed to the results analyzed by Spurr and Koutbeiy
(1991),but the estimates of concentration parameter j for
the three algorithms are different.
Example 8 In this example,we consider the SIDS data
given by Mooney et al.(2003).They collected the SIDS
cases in the UK by month of the death for the years
1983–1998.According to their analysis,the SIDS data for
the year 1998 could be fitted by a mixture of two von
Misses distributions so that we may apply the MSBC to the
SIDS data for the year 1998.In Mooney et al.(2003),they
had discussed the property of the SIDS data and then
considered to modify all of 12 months to 31 days,i.e.
before models were fitted,the SIDS data were month-
corrected to 31 days by multiplying February (i.e.,
28 days) by 1.097 and the 30-day months by 1.033.Thus,
the corrected numbers of SIDS cases for the year 1998 are
shown in Table 11 (also see Table 1 in Mooney et al.
2003).In general,the number of cases in a year can be
converted to angles for fitting von Misses distribution (see
Mardia and Jupp 2000).Hence,we convert the corrected
numbers of SIDS cases for the year 1998 in Table 11 to
circular data by dividing 360 into 12,i.e.the 12 months of
1 year could divide the angular range of a circle (0,360)
into 12 intervals.January corresponds to the interval (0,
30),February corresponds to the interval (30,60),and
so on.According to Table 11,we generate 40 random
numbers from the uniform distribution on the interval (0,
30),31 random numbers from the uniform distribution on
the interval (30,60),and so on.Thus,we have a circular
data of 402 data points.We apply the MSBC to this data
set.The graphical plot of the correlation comparisons is
-1
0
1
-1
0
1
(a)
data set
θ
-1
0
1
-1
0
1
(b)
iteration = 0
ν
-1
0
1
-1
0
1
(c)
iteration = 1
ν
-1
0
1
-1
0
1
(d)
iteration = 2
ν
-1
0
1
-1
0
1
(e)
iteration = 3
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(f)
data set and cluster centers(final)
θ
ν
Fig.24 a Three cluster data set,b–e states of cluster centers for
iteration = 0,1,2,3,f data set and cluster centers after convergence
Table 8 MSE,mean,and
standard deviations of 100 data
sets for m
Algorithms MSBC EM FCD
MSE 1.79 1.96 1.97
Mean for m (59.72,180.09,300.05) (59.45,180.04,300.16) (59.42,180.06,300.17)
Standard deviation for m (1.53,1.31,1.12) (1.56,1.39,1.11) (1.57,1.39,1.10)
299
300
298
297
296
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
112
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
113
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
114
0
0.5
1
1.5
(a)
-1
0
1
-1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
33
3
3
3
3
3
3
33
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
33
3
33
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
Fig.25 a Hierarchical tree of data set,b identified clusters
S.-J.Chang-Chien et al.
123
shown in Fig.30,where p = 9 is a suitable estimate of
p.Figure 31 shows the MSBC procedure for the data set.
We can see that all cluster centers agglomerated to two
points.Figure 32 shows the hierarchical tree and identified
clusters.From Fig.32a,the optimal cluster number is 2.It
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
data set
θ
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(b)
iteration = 0
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(c)
iteration = 1
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(d)
iteration = 3
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(e)
iteration = 5
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(f)
data set and cluster centers (final)
θ
ν
Fig.28 a Data set for the turtle data,b–e states of cluster centers for
iteration = 0,1,3,5,f data set and cluster centers after convergence
Table 9 Directions of 76 turtles after laying eggs
8 9 13 13 14 18 22 27 30 34
38 38 40 44 45 47 48 48 48 48
50 53 56 57 58 58 61 63 64 64
64 65 65 68 70 73 78 78 78 83
83 88 88 88 90 92 92 93 95 96
98 100 103 106 113 118 138 153 153 155
204 215 223 226 237 238 243 244 250 251
257 268 285 319 343 350
5
10
15
30
210
60
240
90
270
120
300
150
330
180
0
Fig.26 Circular histogram for turtle data
0
5
10
15
20
25
30
35
40
45
50
0.98
0.985
0.99
0.995
1
1.005
suitable estimate
Fig.27 Graph of correlation comparisons with the estimated p = 9
75
76
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
74
61
73
62
64
65
66
67
68
69
70
71
72
63
0
0.5
1
1.5
2
(a)
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
1
1
11
1
1
1
1
1
1
11
1
1
1
1
11
1
1
1
1
1
11
1
1
1
11
11
1
1
1
1
11
11111
1
1
1
1
11
1
1
11
111
1111
1
1
1
(b)
2
2
2
2
22
2
2
22
2
2
2
Fig.29 a Hierarchical tree for the turtle data,b identified clusters for
the turtle data
On mean shift-based clustering for circular data
123
has the same indications to the analysis of Mooney et al.
(2003).According to the hierarchical tree,we obtain the
clustering results as shown in Fig.32b.The estimate of two
cluster centers m
1
and m
2
is obtained with 151.44 and
337.50.Hence,we have a conclusion that there are two
peaks of SIDS cases for the year 1998.One is a border
between May and June.Another is a border between
November and December.
Finally,we compare the MSBC with Kobayashi and
Otsu’s (2010) method.Kobayashi and Otsu (2010) con-
structed their von Mises-Fisher mean shift by generalizing
the von Mises-Fisher distribution to be F(x;m,j) =
C
F
(j)f(x
0
m;j),where f is a monotonically increasing con-
vex function and g is a derivative of f.However,in prac-
tical applications,we need to specify the functions f and g.
In their experiments,Kobayashi and Otsu (2010) used the
following kernel function f(x;j) with its derivative g:
f ðx;jÞ ¼
0 0x j
1
2
ðx jÞ jx 1

and
gðx;jÞ ¼
0 0x j
x j jx1

:
In the next example,we use these specified functions to
make the comparisons of the MSBC with Kobayashi and
Otsu’s (2010) method.
Example 9 For comparing the MSBC and Kobayashi and
Otsu’s (2010) method,we evaluate the MSEs to see the
performance for the two algorithms.Each MSE is evalu-
ated using 100 data sets.We consider the following mix-
ture distributions:
A:0:3WNð0:5p;0:95Þ þ0:7WNðp;0:96Þ in Example 1:
B:0:3WCð0:5p;0:91Þ þ0:7WCðp;0:925Þ in Example 1:
C:0:4VMð2p=3;8:6Þ þ0:6VMð4p=3;10:3Þ in Example 4:
D:0:4VMð2p=3;8:6Þ þ0:6VMð4p=3;10:3Þ by adding
an outlier with 0

in Example 5:
E:0:25VMðp=3;10:3Þ þ0:35VMðp;8:6Þ þ0:4VM
ð5p=3;12:8Þ in Example 6:
The MSEs for the two algorithms on these five various
mixture distributions are shown in Table 12.From
Table 12,we can see that the MSEs for the two
algorithms are quite closed for the mixtures of von Mises
distributions as shown for the mixture distributions C,D
and E.However,the MSBC algorithm has quite smaller
MSEs than Kobayashi and Otsu’s (2010) method on the
mixture of wrapped normal distributions and the mixture of
wrapped Cauchy distributions as shown for the mixture
distributions A and B.The chosen function g by Kobayashi
and Otsu (2010) in their experiments is actually
Table 10 Estimate of
parameters for turtle data
Algorithms MSBC EM FCD Spurr and Koutbeiy
m
1
63.15 63.47 63.50 63.2
m
2
241.13 241.20 241.25 240.2
j
1
2.75 2.65 2.65 2.91
j
2
7.43 8.61 7.43 4.81
a
1
0.83 0.84 0.84 0.82
a
2
0.17 0.16 0.16 0.18
0
5
10
15
20
25
30
35
40
45
50
0.98
0.985
0.99
0.995
1
1.005
suitable estimate
Fig.30 Graph of correlation comparisons with the estimated p = 9
Table 11 Corrected number of SIDS cases for the year 1998
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total
1998 40 31 25 26 29 33 25 20 27 40 43 63 402
S.-J.Chang-Chien et al.
123
appropriated for the data from the mixtures of von Mises-
Fisher distributions.In general,for a different mixture of
distributions,a different function g is necessary to be
chosen.However,it is difficult to find an appropriate g for
Kobayashi and Otsu’s (2010) method in real applications.
Since the MSBC is a nonparametric approach that does not
depend on mixture distributions,it should be more
applicable for real data sets.This becomes an advantage
for the MSBC.Furthermore,we can also evaluate the
increasing rate of MSEs to see the influence of outliers with
increasing rate ¼
MSEðDÞMSEðCÞ
MSEðCÞ
100%:We find that the
increasing rate for the MSBC is 0.054%,but the increasing
rate for Kobayashi and Otsu is 0.64%.It is obvious that an
outlier has smaller influence on the MSBC than Kobayashi
and Otsu’s (2010) method.
5 Conclusions
In this paper,we proposed a MSBC algorithm for circular
data which is a cluster number free,i.e.the MSBC algo-
rithm can automatically find a cluster number with good
cluster centers without giving a priori cluster number.
There are three implementation procedures for the MSBC
are compared.The MSBC with the blurring mean shift is
recommended.The proposed clustering algorithm gives us
a new way in the analysis of grouped circular data in which
it presents the robustness to outlying points.It can also give
us an optimal cluster number estimate.For comparisons,
we used the data sets from mixtures of von Mises distri-
butions,wrapped normal distributions and wrapped
Cauchy distributions,a real circular data set of 76 turtles
after laying eggs and also the SIDS data set to compare the
proposed method with two previous methods,EM and
FCD.The results show the accuracy and superiority of our
proposed algorithmso that it is recommended as a new tool
for the analysis of grouped circular data.We also compared
the proposed MSBC algorithm with Kobayashi and Otsu’s
(2010) method.In this paper,we had defined a good dis-
tance measure between two circular data.However,to
define a good distance measure between directional data on
the hypersphere may become another challenge.To pro-
pose a nonparametric clustering approach for directional
data on the hypersphere will be our further research topic.
Acknowledgments The authors would like to thank the anonymous
referees for their helpful comments in improving the presentation of
this paper.This work was supported in part by the National Science
Council of Taiwan,under Grant NSC-99-2118-M-033-004-MY2.
References
Banerjee A,Dhillon IS,Ghosh J,Sra S (2005) Clustering on the unit
hypersphere using von Mises-Fisher distributions.J Mach Learn
Res 6:1345–1382
Bartels R (1984) Estimation in a bidirectional mixture of von Mises
distributions.Biometrics 40:777–784
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(a)
data set
θ
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(b)
iteration = 0
ν
-1
0
1
-1
0
1
(c)
iteration = 3
ν
-1
0
1
-1
0
1
(d)
iteration = 6
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(e)
iteration = 9
ν
-1
0
1
-1.5
-1
-0.5
0
0.5
1
1.5
(f)
data set and cluster centers (final)
θ
ν
Fig.31 a Data set in Table 12,b–e states of cluster centers for
iteration = 0,3,6,9,f data set and cluster centers after convergence
Table 12 MSEs for the MSBC and Kobayashi and Otsu’s method
Distribution MSBC Kobayashi and Otsu
A (Ex.1) 2.12 3.11
B (Ex.1) 2.88 4.46
C (Ex.4) 1.845 1.867
D (Ex.5) 1.846 1.879
E (Ex.6) 1.79 1.84
401
402
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
400
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
211
213
214
216
217
218
220
221
222
223
224
225
399
227
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
228
76
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
212
215
219
226
0
0.5
1
1.5
2
(a)
-1
0
1
-1
0
1
1
1
1
11
1
1
1
1
1
1
1
1
1
11
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(b)
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2222
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
22
2
2
2
2
2
2
22
2
2
2
2
2
2
2
22
2
2
2
2
2
22
2
2
222
2
2
22
2
2
2
2
2
2
2
22
2
2
22
222
2
2
2
22
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Fig.32 a Hierarchical tree of data set in Table 11,b identified
clusters
On mean shift-based clustering for circular data
123
Carta C,Bueno JA,Ramirez P (2008) Statistical modeling of
directional wind speeds using mixtures of von Mises distribu-
tions.Energy Conserv Manage 49:897–907
Chang-Chien SJ,Yang MS,Hung WL (2010) Mean shift-based
clustering for directional data.In:Proceedings of third interna-
tional workshop on advanced computational intelligence,
pp 367–372
Cheng Y (1995) Mean shift,mode seeking,and clustering.IEEE
Trans Pattern Anal Mach Intell 17:790–799
Comaniciu D,Meer P (2002) Mean shift:a robust approach toward
feature space analysis.IEEE Trans Pattern Anal Mach Intell 24:
603–619
Corcoran J,Chhetri P,Stimson R (2009) Using circular statistics to
explore the geography of the journey to work.Papers Reg Sci
88:119–132
Cristianini N,Shawe-Taylor J (2000) An introduction to support
vector machines.Cambridge University Press,Cambridge
Dempster AP,Laird NM,Rubin DB (1977) Maximum likelihood
from incomplete data via the EM algorithm (with discussion).
J Roy Stat Soc Ser B 39:1–38
Dortet-Bernadet JL,Wicker N (2008) Model-based clustering on the
unit sphere with an illustration using gene expression profiles.
Biostatistics 9(1):66–80
Fashing M,Tomasi C (2005) Mean shift is a bound optimization.
IEEE Trans Pattern Anal Mach Intell 27:471–474
Filippone M,Camastra F,Masulli F,Rovetta S (2008) A survey of
kernel and spectral methods for clustering.Pattern Recogn
41:176–190
Fisher NI (1993) Statistical analysis of circular data.Cambridge
University Press,Cambridge
Fukunaga K,Hostetler LD (1975) The estimation of the gradient of a
density function with applications in pattern recognition.IEEE
Trans Inf Theory 21:32–40
Gao KS,Chia F,Krantz I,Nordin P,Machin D (2006) On the
application of the von Mises distribution and angular regression
methods to investigate the seasonality of disease onset.Stat Med
25:1593–1618
Kobayashi T,Otsu N (2010) Von Mises-Fisher mean shift for
clustering on a hypersphere.In:Proceedings of the 20th
international conference on pattern recognition,ICPR 2010,
pp 2130–2133
Lee A (2010) Circular data,WIREs.Comput Stat 2:477–486
Mardia KV (1972) Statistics of directional data.Academic Press,
London
Mardia KV,Jupp PE (2000) Directional statistics.Wiley,New York
Mardia KV,Sutton TW(1975) On the modes of a mixture of two von
Mises distributions.Biometrika 62:699–701
McGraw T,Vemuri BC,Yezierski B,Mareci T (2006) Von Mises-
fisher mixture model of the diffusion ODF.In:Proceedings of
the 3rd IEEE international symposium on biomedical imaging:
from Nano to Macro,pp 65–68
McLachlan GJ,Basford KE (1988) Mixture models:inference and
applications to clustering.Marcel Dekker,NY
Mooney JA,Helms PJ,Jolliffe IT (2003) Fitting mixtures of von
Mises distributions:a case study involving sudden infant death
syndrome.Comput Stat Data Anal 41:505–513
Scho
¨
lkopf B,Smola A,Mu
¨
ller K (1998) Nonlinear component
analysis as a kernel eigenvalue problem.Neural Comput 10:
1299–1319
Silverman BW (1998) Density estimation for statistics and data
analysis.Chapman and Hall,New York
Spurr BD,Koutbeiy MA (1991) A comparison of various methods for
estimating the parameters in mixtures of von Mises distributions.
Commun Stat Simul Comput 20:725–741
Stephens MA (1969) Techniques for directional data.Tech.Report
#150,Department of Statistics,Stanford University,Stanford
Thang ND,Chen L,Chan CK (2008) Feature reduction using mixture
model of directional distributions.In:Proceedings of the 10th
international conference on control,automation,robotics and
vision,ICARCV,pp 2208–2212
Vapnik VN (1998) Statistical learning theory.Wiley,New York
Von Mises R (1918) Uber die die ‘‘Ganzzahligkeit’’ der Atomgewicht
und verwandte Fragen.Physikal Z 19:490–500
Watson GS,Williams EJ (1956) On the construction of significance
tests on the circle and the sphere.Biometrika 43:344–352
Wu KL,Yang MS (2007) Mean shift-based clustering.Pattern
Recogn 40:3035–3052
Yang MS,Pan JA (1997) On fuzzy clustering of directional data.
Fuzzy Sets Syst 91:319–326
S.-J.Chang-Chien et al.
123