Outline
A
Uniﬁed Metric for Categorical and Numerical
Attributes in Data Clustering
Yiuming Cheung and Hong Jia
Department of Computer Science and Institute of Computational and Theoretical Studies
Hong Kong Baptist University,Hong Kong SAR,China
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
1/35
Outline
Outline
1
Introduction
Motiv
ation
Pre
vious Work
Objectiv
e
2
Objectcluster
Similarity Metric
Cluster
ing Task
Similar
ity Metric for Mixed Data
3
Iter
ative Clustering Algorithm
4
Exper
iments
Ev
aluation Criteria
P
erformance on Mixed Data Sets
P
erformance on Categorical Data Sets
5
Conclusion
6
Ac
knowledgment
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
2/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Outline
1
Introduction
Motiv
ation
Pre
vious Work
Objectiv
e
2
Objectcluster
Similarity Metric
Cluster
ing Task
Similar
ity Metric for Mixed Data
3
Iter
ative Clustering Algorithm
4
Exper
iments
Ev
aluation Criteria
P
erformance on Mixed Data Sets
P
erformance on Categorical Data Sets
5
Conclusion
6
Ac
knowledgment
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
3/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Motivation
Cluster
ing and Attribute
Clustering:
A
widely utilized technique in variant scientiﬁc areas;
The
main task is to discover the natural group structure of objects
represented by numerical or categorical attributes (Michalski et
al.,1998).
Attribute:
An
attribute is a property or characteristic of an object;
Each
object is described by a collection of attributes;
There
exists two different types of attributes:
 Numerical attributes:can be ordered by numbers;
 Categorical attributes:cannot be ordered by their values,but can
be separated into groups.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
4/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Motivation
An
Example:Diagnostic Records of Patients
UCI
Heart Disease Data set:contains 8 categorical and 5 numerical
attributes.
Attrib
ute
Descriptor
Pr
operty
T
ype
Age
contin
uous
n
umerical
Se
x
{F
,M}
discrete
categor
ical
Chest
pain type
{typical
angina,atypical angina,...}
discrete
categor
ical
Resting
blood pressure
contin
uous
n
umerical
Ser
um cholestoral
contin
uous
n
umerical
F
asting blood sugar
{> 120mg
=dl, 120mg=dl}
discrete
categor
ical
Resting
electrocardiographic
{type
I,type II,type III}
discrete
categor
ical
Maxim
um heart rate
contin
uous
n
umerical
Ex
ercise induced angina
{y
es,no}
discrete
categor
ical
ST
depression
contin
uous
n
umerical
Slope
of ST segment
{upsloping,
ﬂat,downsloping}
discrete
categor
ical
CA
contin
uous
n
umerical
THAL
{nor
mal,ﬁxed defect,reversable defect}
discrete
categor
ical
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
5/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Motivation
Prob
lem
T
raditional clustering methods often concentrate on purely
numerical data only.
There
exists an awkward gap between the similarity metrics for
categorical and numerical data.
T
ransforming the categorical values into numerical ones will
ignore the similarity information embedded in the categorical
values and cannot faithfully reveal the similarity structure of the
data sets (Hsu,TNN’2006).
It
is desirable to solve this problem by ﬁnding a uniﬁed similarity metric
for categorical and numerical attributes.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
6/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Pre
vious Work
Roughly
,the existing approaches dealing with categorical attributes in
clustering analysis can be summarized into the four categories:
Methods
based on the perspective of similarity
 Similarity Based Agglomerative Clustering (SBAC) algorithm (Li and Biswas,TKDE’02)
Methods
based on graph partitioning
 CLICKS algorithm (Zaki and Peters,ICDE’2005)
Entrop
ybased methods
 COOLCAT algorithm (Barbara et al.,CIKM’2002)
Approaches
that attempt to give a distance metric for categorical values
 Kprototype algorithm (Huang,PAKDD’97)
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
7/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Objectiv
e
Giv
e a uniﬁed similarity metric which can be simply applied to the
data with categorical,numerical,and mixed attributes;
Design
an efﬁcient clustering algorithm which is applicable to the
three types of data:numerical,categorical,and mixed data.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
8/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Outline
1
Introduction
Motiv
ation
Pre
vious Work
Objectiv
e
2
Objectcluster
Similarity Metric
Cluster
ing Task
Similar
ity Metric for Mixed Data
3
Iter
ative Clustering Algorithm
4
Exper
iments
Ev
aluation Criteria
P
erformance on Mixed Data Sets
P
erformance on Categorical Data Sets
5
Conclusion
6
Ac
knowledgment
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
9/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Clustering Task
Cluster
ing Task
Cluster
ing a set of N objects,fx
1
;x
2
;:::;x
N
g,into k different clusters,
denoted as C
1
,C
2
,:::,C
k
,can be formulated to ﬁnd the optimal Q
via
Q
= arg max
Q
F(Q) = arg max
Q
[
k
X
j=1
N
X
i=1
q
ij
s(x
i
;C
j
)];(1)
where s(x
i
;C
j
) is the similarity between object x
i
and Cluster C
j
,and
Q= (q
ij
) is an N k partition matrix satisfying
k
X
j=1
q
ij
= 1;0 <
N
X
i=1
q
ij
< N;and q
ij
2 [0;1]:(2)
Evidently,the desired clusters can be obtained as long as the metric of
objectcluster similarity is determined.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
10/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Representation
of Mixed Data
Suppose
the mixed data x
i
with d different attributes consists of d
c
categorical attributes and d
u
numerical attributes (d
c
+d
u
= d).
x
i
can be denoted as [x
c
i
T
;x
u
i
T
]
T
with x
c
i
= (x
c
i1
;x
c
i2
;:::;x
c
id
c
)
T
and
x
u
i
= (x
u
i1
;x
u
i2
;:::;x
u
id
u
)
T
.
Here,we have:
x
u
ir
(r =
1;2;:::;d
u
) belonging to R;
x
c
ir
(r =
1;2;:::;d
c
) belonging to dom(A
r
),where dom(A
r
) contains all
possible values that can be chosen by categorical attribute A
r
.
Specially
,dom(A
r
) with m
r
elements can be represented with
dom(A
r
) = fa
r1
;a
r2
;:::;a
rm
r
g.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
11/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Deﬁnition
of s(x
i
;C
j
) (I)
Obser
vations:In clustering analysis,numerical attributes are usually treated
as a whole vector while the categorical attributes are investigated individually.
Deﬁnition:Let the objectcluster similarity s(x
i
;C
j
) be the average of the
similarity calculated based on each attribute,we will then have
s(x
i
;C
j
) =
1
d
s(x
c
i1
;
C
j
) +
1
d
s(x
c
i2
;
C
j
) +:::+
1
d
s(x
c
id
c
;
C
j
) +
d
u
d
s(x
u
i
;
C
j
)
=
1
d
d
c
X
r=1
s(x
c
ir
;
C
j
) +
d
u
d
s(x
u
i
;
C
j
):(3)
Here,the similarity between each numerical attribute and the cluster C
j
is
replaced with the similarity between the cluster and the whole numerical
vector x
u
i
.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
12/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Deﬁnition
of s(x
i
;C
j
) (II)
If
we denote the similarity between x
c
i
and C
j
as s(x
c
i
;C
j
),we can get
s(x
c
i
;C
j
) =
1
d
c
d
c
X
r=1
s(x
c
ir
;
C
j
) =
d
c
X
r=1
1
d
c
s(x
c
ir
;
C
j
):(4)
Then,previous Eq.(3) can be further rewritten as
s(x
i
;C
j
) =
d
c
d
s(x
c
i
;
C
j
) +
d
u
d
s(x
u
i
;
C
j
);(5)
Subsequently,the objectcluster similarity metric can be obtained
based on the deﬁnitions of s(x
c
i
;C
j
) and s(x
u
i
;C
j
).
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
13/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Similar
ity Metric for Categorical Attributes (I)
T
aking into account the unequal importance of different categorical
attributes for clustering analysis,the computation of s(x
c
i
;C
j
) should
be further modiﬁed with
s(x
c
i
;C
j
) =
d
c
X
r=1
w
r
s(x
c
ir
;C
j
);(6)
where w
r
is the weight of categorical attribute A
r
satisfying 0 w
r
1
and
d
c
P
r=1
w
r
= 1.
That is,the objectcluster similarity for categorical part is the weighted
summation of the similarity between the cluster and each attribute
value.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
14/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Similar
ity Metric for Categorical Attributes (II)
Deﬁnition
1
The
similarity between a categorical attribute value x
c
ir
and cluster C
j
is
deﬁned as:
s(x
c
ir
;C
j
) =
A
r
=x
c
ir
(C
j
)
A
r
6=N
ULL
(C
j
)
;(7)
where
A
r
=x
c
ir
(C
j
) counts the number of objects in cluster C
j
that have the
value x
c
ir
for attribute A
r
,NULL refers to empty.
Theref
ore,the objectcluster similarity for categorical part is calculated by
s(x
c
i
;C
j
) =
d
c
X
r=1
w
r
s(x
c
ir
;C
j
) =
d
c
X
r=1
w
r
A
r
=x
c
ir
(C
j
)
A
r
6=N
ULL
(C
j
)
:(8)
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
15/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Calculation
of Categorical Attribute Weights
F
rom the view point of information theory,the importance of any categorical
attribute A
r
can be estimated by
H
A
r
=
1
m
r
m
r
X
t=1
p(a
r
t
) log p(a
rt
) with p(a
rt
) =
A
r
=a
rt
(X)
A
r
6=N
ULL
(X)
;(9)
where a
rt
2 dom(A
r
),X is the whole data set and m
r
is the number of values
can be chosen by A
r
.
The weight of
each attribute is then computed as
w
r
= H
A
r
=
d
c
X
t=1
H
A
t
:(10)
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
16/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Similar
ity Metric for Numerical Attributes (I)
It
is a universal law that the distance and perceived similarity
between numerical vectors are related via an exponential function
as follows:
s(x
A
;x
B
) = exp(Dis(x
A
;x
B
));(11)
where Dis stands for a distance measure.
Moreo
ver,to avoid the inﬂuence of different magnitudes of
distances,we can further use proportional distance instead of
absolute distance.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
17/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Similar
ity Metric for Numerical Attributes (II)
Deﬁnition
2
The
objectcluster similarity between numerical vector x
u
i
and cluster C
j
is
given by
s(x
u
i
;C
j
) = exp
0
B
B
@
Dis(x
u
i
;c
j
)
k
P
t=1
D
is(x
u
i
;c
t
)
1
C
C
A
;(12)
where c
j
is the center of all numerical vectors in cluster C
j
.
In
practice,different distance metrics can be utilized to calculate Dis(x
u
i
;c
j
).
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
18/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Similarity Metric for Mixed Data
Calculation
of Objectcluster Similarity
According
to previous descriptions,the objectcluster similarity metric for
mixed data is given by
s(x
i
;C
j
) =
d
c
d
d
c
X
r=1
0
B
B
B
@
H
A
r
d
c
P
t=1
H
A
t
A
r
=x
c
ir
(C
j
)
A
r
6=N
ULL
(C
j
)
1
C
C
C
A
+
d
u
d
exp
0
B
B
@
D
is(x
u
i
;c
j
)
k
P
t=1
D
is(x
u
i
;c
t
)
1
C
C
A
;
(13)
where i = 1;2;:::;N,j = 1;2;:::;k.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
19/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Outline
1
Introduction
Motiv
ation
Pre
vious Work
Objectiv
e
2
Objectcluster
Similarity Metric
Cluster
ing Task
Similar
ity Metric for Mixed Data
3
Iter
ative Clustering Algorithm
4
Exper
iments
Ev
aluation Criteria
P
erformance on Mixed Data Sets
P
erformance on Categorical Data Sets
5
Conclusion
6
Ac
knowledgment
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
20/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Cluster
ing Criterion
W
e concentrate on hard partition only,i.e.,q
ij
2 f0;1g.
Giv
en a set of N objects,the optimal Q
= fq
ij
g in Eq.(1) can be
given by
q
ij
=
1;if s(x
i
;C
j
) s(x
i
;C
r
);1 r k;
0;otherwise:
(14)
Similar
to the learning procedure of kmeans,an iterative
algorithm can be conducted to implement the clustering analysis.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
21/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
OCIL
Algorithm
Iter
ative clustering learning based on objectcluster similarity metric:
Require:data set X = fx
1
;x
2
;:::;x
N
g,number of clusters k
Ensure:cluster label Y = fy
1
;y
2
;:::;y
N
g
1:Calculate the importance of each categorical attribute if applicable
2:Set Y = f0;0;:::;0g and randomly select k initial objects,one for each cluster
3:repeat
4:Initialize noChange = true
5:for i = 1 to N do
6:y
(new)
i
= arg max
j2f1;:::;kg
[s(x
i
;C
j
)]
7:if y
(new)
i
6= y
(old)
i
then
8:noChange = false
9:Update the information of clusters C
y
(new)
i
and C
y
(old)
i
,including the frequency of
each categorical value and the centroid of numerical vectors
10:end if
11:end for
12:until noChange is true
13:return Y
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
22/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Outline
1
Introduction
Motiv
ation
Pre
vious Work
Objectiv
e
2
Objectcluster
Similarity Metric
Cluster
ing Task
Similar
ity Metric for Mixed Data
3
Iter
ative Clustering Algorithm
4
Exper
iments
Ev
aluation Criteria
P
erformance on Mixed Data Sets
P
erformance on Categorical Data Sets
5
Conclusion
6
Ac
knowledgment
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
23/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Evaluation Criteria
Ev
aluation Criteria
Cluster
ing Accuracy (ACC):
ACC =
P
N
i=1
(c
i
;map(r
i
))
N
;
where map(r
i
) maps
the obtained cluster label r
i
to the equivalent
label from the data corpus by using the KuhnMunkres algorithm.
Cluster
ing Error Rate:
e = 1 ACC
.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
24/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Performance on Mixed Data Sets
Mix
ed Data Sets
T
able 1:Statistics of mixed data sets
Data
set Instance Attribute (d
c
+d
u
) Class
Statlog
Heart 270 7 + 6 2
Heart Disease 303 7 + 6 2
Credit Approval 653 9 + 6 2
German Credit 1000 13 + 7 2
Dermatology 366 33 + 1 6
Adult 30162 8 + 6 2
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
25/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Performance on Mixed Data Sets
Cluster
ing Errors on Mixed Data Sets
T
able 2:Clustering errors of OCIL on mixed data sets in comparison with
kprototype and kmeans
Data
set Kmeans Kprototype OCIL
Statlog
0.40470.0071 0.23060.0821 0.17160.0065
Heart 0.42240.0131 0.22800.0903 0.16440.0030
Credit 0.44870.0016 0.26190.0976 0.25190.0966
German 0.32900.0014 0.32890.0006 0.30570.0007
Dermatology 0.70060.0216 0.69030.0255 0.30510.0896
Adult 0.38690.0067 0.38550.0143 0.30790.0305
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
26/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Performance on Mixed Data Sets
Compar
ison of Convergence Rate
T
able 3:Comparison of average convergent time and iterations between
kprototype and OCIL
Data
set
Time
Iter
ations
Kprototype
OCIL
Kprototype
OCIL
Statlog
0.0519s 0.0516s
3.09 3.07
Hear
t
0.0639s 0.0576s
3.54 3.02
Credit
0.1323s
0.1625s
3.18 4.26
Ger
man
0.2999s 0.2023s
5.29 3.15
Der
matol
0.3674s 0.1888s
7.27 4.32
Adult
15.2795s 9.6774s
10.93 6.78
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
27/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Performance on Categorical Data Sets
Categor
ical Data Sets
T
able 4:Statistics of categorical data sets
Data
set Instance Attribute Class
So
ybean 47 35 4
Breast 699 9 2
Vote 435 16 2
Zoo 101 16 7
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
28/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Performance on Categorical Data Sets
Cluster
ing Errors on Categorical Data Sets
T
able 5:Comparison of clustering errors obtained by three different
methods on categorical data sets
Data
set H’s kmodes N’s kmodes OCIL
So
ybean 0.16910.1521 0.09640.1404 0.10170.1380
Breast 0.16550.1528 0.13560.0016 0.09340.0009
Vote 0.13870.0066 0.13450.0031 0.12130.0010
Zoo 0.28730.1083 0.27300.0818 0.26810.0906
H’
s kmodes:original kmodes algorithm (Huang,SIGMOD’97);
N’s kmodes:kmodes algorithm with Ng’s dissimilarity metric (Ng et al.,TPAMI’07);
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
29/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Outline
1
Introduction
Motiv
ation
Pre
vious Work
Objectiv
e
2
Objectcluster
Similarity Metric
Cluster
ing Task
Similar
ity Metric for Mixed Data
3
Iter
ative Clustering Algorithm
4
Exper
iments
Ev
aluation Criteria
P
erformance on Mixed Data Sets
P
erformance on Categorical Data Sets
5
Conclusion
6
Ac
knowledgment
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
30/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Conclusion
A
general clustering framework based on objectcluster similarity has
been proposed.
A
uniﬁed similarity metric for both categorical and numerical attributes
has been presented.
An
iterative algorithm which is applicable to clustering analysis on
various data types has been introduced.
The
advantages of the proposed method have been experimentally
demonstrated in comparison with the existing counterparts
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
31/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Outline
1
Introduction
Motiv
ation
Pre
vious Work
Objectiv
e
2
Objectcluster
Similarity Metric
Cluster
ing Task
Similar
ity Metric for Mixed Data
3
Iter
ative Clustering Algorithm
4
Exper
iments
Ev
aluation Criteria
P
erformance on Mixed Data Sets
P
erformance on Categorical Data Sets
5
Conclusion
6
Ac
knowledgment
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
32/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Ac
knowledgment
Collabor
ative Graduate Program in Design,Kyoto University;
Depar
tment of Computer Science,Hong Kong Baptist University.
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
33/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Ref
erences
1.
Michalski,R.S.,Bratko,I.,Kubat,M.:Machine learning and data mining:methods and
applications.Wiley,New York (1998)
2.Hsu,C.C.:Generalizing selforganizing map for categorical data.IEEE Transactions on
Neural Networks 17(2) (March 2006) 294–304
3.Li,C.,Biswas,G.:Unsupervised learning with mixed numeric and nominal data.IEEE
Transactions on Knowledge and Data Engineering 14(4)(July/August 2002) 673–690
4.Zaki,M.J.,Peters,M.:Click:Mining subspace clusters in categorical data via kpartite
maximal cliques.In:Proceedings of the 21st International Conference on Data Engineering.
(2005) 355–356
5.Barbara,D.,Couto,J.,Li,Y.:Coolcat:An entropybased algorithm for categorical clustering.
In:Proceedings of the 11th ACM Conference on Information and Knowledge Management.
(2002) 582–589
6.Huang,Z.:Clustering large data sets with mixed numeric and categorical values.In:
Proceedings of the First PaciﬁcAsia Conference on Knowledge Discovery and Data Mining.
(1997) 21–24
7.Huang,Z.:A fast clustering algorithm to cluster very large categorical data sets in data
mining.In:Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and
Know ledge Discovery.(1997) 1–8
8.Ng,M.K.,Li,M.J.,Huang,J.Z.,He,Z.:On the impact of dissimilarity measure in kmodes
clustering algorithm.IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3)
(2007) 503–507
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
34/35
Introduction
Objectcluster
Similarity Metric
Iter
ative Clustering Algorithm
Exper
iments Conclusion Acknowledgment
Thank You!
Y
iuming Cheung and Hong Jia (HKBU)
Uniﬁed
Metric for Mixed Data Clustering
2013
35/35
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment