Outline

A

Uniﬁed Metric for Categorical and Numerical

Attributes in Data Clustering

Yiu-ming Cheung and Hong Jia

Department of Computer Science and Institute of Computational and Theoretical Studies

Hong Kong Baptist University,Hong Kong SAR,China

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

1/35

Outline

Outline

1

Introduction

Motiv

ation

Pre

vious Work

Objectiv

e

2

Object-cluster

Similarity Metric

Cluster

ing Task

Similar

ity Metric for Mixed Data

3

Iter

ative Clustering Algorithm

4

Exper

iments

Ev

aluation Criteria

P

erformance on Mixed Data Sets

P

erformance on Categorical Data Sets

5

Conclusion

6

Ac

knowledgment

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

2/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Outline

1

Introduction

Motiv

ation

Pre

vious Work

Objectiv

e

2

Object-cluster

Similarity Metric

Cluster

ing Task

Similar

ity Metric for Mixed Data

3

Iter

ative Clustering Algorithm

4

Exper

iments

Ev

aluation Criteria

P

erformance on Mixed Data Sets

P

erformance on Categorical Data Sets

5

Conclusion

6

Ac

knowledgment

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

3/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Motivation

Cluster

ing and Attribute

Clustering:

A

widely utilized technique in variant scientiﬁc areas;

The

main task is to discover the natural group structure of objects

represented by numerical or categorical attributes (Michalski et

al.,1998).

Attribute:

An

attribute is a property or characteristic of an object;

Each

object is described by a collection of attributes;

There

exists two different types of attributes:

- Numerical attributes:can be ordered by numbers;

- Categorical attributes:cannot be ordered by their values,but can

be separated into groups.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

4/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Motivation

An

Example:Diagnostic Records of Patients

UCI

Heart Disease Data set:contains 8 categorical and 5 numerical

attributes.

Attrib

ute

Descriptor

Pr

operty

T

ype

Age

contin

uous

n

umerical

Se

x

{F

,M}

discrete

categor

ical

Chest

pain type

{typical

angina,atypical angina,...}

discrete

categor

ical

Resting

blood pressure

contin

uous

n

umerical

Ser

um cholestoral

contin

uous

n

umerical

F

asting blood sugar

{> 120mg

=dl, 120mg=dl}

discrete

categor

ical

Resting

electrocardiographic

{type

I,type II,type III}

discrete

categor

ical

Maxim

um heart rate

contin

uous

n

umerical

Ex

ercise induced angina

{y

es,no}

discrete

categor

ical

ST

depression

contin

uous

n

umerical

Slope

of ST segment

{upsloping,

ﬂat,downsloping}

discrete

categor

ical

CA

contin

uous

n

umerical

THAL

{nor

mal,ﬁxed defect,reversable defect}

discrete

categor

ical

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

5/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Motivation

Prob

lem

T

raditional clustering methods often concentrate on purely

numerical data only.

There

exists an awkward gap between the similarity metrics for

categorical and numerical data.

T

ransforming the categorical values into numerical ones will

ignore the similarity information embedded in the categorical

values and cannot faithfully reveal the similarity structure of the

data sets (Hsu,TNN’2006).

It

is desirable to solve this problem by ﬁnding a uniﬁed similarity metric

for categorical and numerical attributes.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

6/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Pre

vious Work

Roughly

,the existing approaches dealing with categorical attributes in

clustering analysis can be summarized into the four categories:

Methods

based on the perspective of similarity

- Similarity Based Agglomerative Clustering (SBAC) algorithm (Li and Biswas,TKDE’02)

Methods

based on graph partitioning

- CLICKS algorithm (Zaki and Peters,ICDE’2005)

Entrop

y-based methods

- COOLCAT algorithm (Barbara et al.,CIKM’2002)

Approaches

that attempt to give a distance metric for categorical values

- K-prototype algorithm (Huang,PAKDD’97)

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

7/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Objectiv

e

Giv

e a uniﬁed similarity metric which can be simply applied to the

data with categorical,numerical,and mixed attributes;

Design

an efﬁcient clustering algorithm which is applicable to the

three types of data:numerical,categorical,and mixed data.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

8/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Outline

1

Introduction

Motiv

ation

Pre

vious Work

Objectiv

e

2

Object-cluster

Similarity Metric

Cluster

ing Task

Similar

ity Metric for Mixed Data

3

Iter

ative Clustering Algorithm

4

Exper

iments

Ev

aluation Criteria

P

erformance on Mixed Data Sets

P

erformance on Categorical Data Sets

5

Conclusion

6

Ac

knowledgment

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

9/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Clustering Task

Cluster

ing Task

Cluster

ing a set of N objects,fx

1

;x

2

;:::;x

N

g,into k different clusters,

denoted as C

1

,C

2

,:::,C

k

,can be formulated to ﬁnd the optimal Q

via

Q

= arg max

Q

F(Q) = arg max

Q

[

k

X

j=1

N

X

i=1

q

ij

s(x

i

;C

j

)];(1)

where s(x

i

;C

j

) is the similarity between object x

i

and Cluster C

j

,and

Q= (q

ij

) is an N k partition matrix satisfying

k

X

j=1

q

ij

= 1;0 <

N

X

i=1

q

ij

< N;and q

ij

2 [0;1]:(2)

Evidently,the desired clusters can be obtained as long as the metric of

object-cluster similarity is determined.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

10/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Representation

of Mixed Data

Suppose

the mixed data x

i

with d different attributes consists of d

c

categorical attributes and d

u

numerical attributes (d

c

+d

u

= d).

x

i

can be denoted as [x

c

i

T

;x

u

i

T

]

T

with x

c

i

= (x

c

i1

;x

c

i2

;:::;x

c

id

c

)

T

and

x

u

i

= (x

u

i1

;x

u

i2

;:::;x

u

id

u

)

T

.

Here,we have:

x

u

ir

(r =

1;2;:::;d

u

) belonging to R;

x

c

ir

(r =

1;2;:::;d

c

) belonging to dom(A

r

),where dom(A

r

) contains all

possible values that can be chosen by categorical attribute A

r

.

Specially

,dom(A

r

) with m

r

elements can be represented with

dom(A

r

) = fa

r1

;a

r2

;:::;a

rm

r

g.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

11/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Deﬁnition

of s(x

i

;C

j

) (I)

Obser

vations:In clustering analysis,numerical attributes are usually treated

as a whole vector while the categorical attributes are investigated individually.

Deﬁnition:Let the object-cluster similarity s(x

i

;C

j

) be the average of the

similarity calculated based on each attribute,we will then have

s(x

i

;C

j

) =

1

d

s(x

c

i1

;

C

j

) +

1

d

s(x

c

i2

;

C

j

) +:::+

1

d

s(x

c

id

c

;

C

j

) +

d

u

d

s(x

u

i

;

C

j

)

=

1

d

d

c

X

r=1

s(x

c

ir

;

C

j

) +

d

u

d

s(x

u

i

;

C

j

):(3)

Here,the similarity between each numerical attribute and the cluster C

j

is

replaced with the similarity between the cluster and the whole numerical

vector x

u

i

.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

12/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Deﬁnition

of s(x

i

;C

j

) (II)

If

we denote the similarity between x

c

i

and C

j

as s(x

c

i

;C

j

),we can get

s(x

c

i

;C

j

) =

1

d

c

d

c

X

r=1

s(x

c

ir

;

C

j

) =

d

c

X

r=1

1

d

c

s(x

c

ir

;

C

j

):(4)

Then,previous Eq.(3) can be further rewritten as

s(x

i

;C

j

) =

d

c

d

s(x

c

i

;

C

j

) +

d

u

d

s(x

u

i

;

C

j

);(5)

Subsequently,the object-cluster similarity metric can be obtained

based on the deﬁnitions of s(x

c

i

;C

j

) and s(x

u

i

;C

j

).

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

13/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Similar

ity Metric for Categorical Attributes (I)

T

aking into account the unequal importance of different categorical

attributes for clustering analysis,the computation of s(x

c

i

;C

j

) should

be further modiﬁed with

s(x

c

i

;C

j

) =

d

c

X

r=1

w

r

s(x

c

ir

;C

j

);(6)

where w

r

is the weight of categorical attribute A

r

satisfying 0 w

r

1

and

d

c

P

r=1

w

r

= 1.

That is,the object-cluster similarity for categorical part is the weighted

summation of the similarity between the cluster and each attribute

value.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

14/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Similar

ity Metric for Categorical Attributes (II)

Deﬁnition

1

The

similarity between a categorical attribute value x

c

ir

and cluster C

j

is

deﬁned as:

s(x

c

ir

;C

j

) =

A

r

=x

c

ir

(C

j

)

A

r

6=N

ULL

(C

j

)

;(7)

where

A

r

=x

c

ir

(C

j

) counts the number of objects in cluster C

j

that have the

value x

c

ir

for attribute A

r

,NULL refers to empty.

Theref

ore,the object-cluster similarity for categorical part is calculated by

s(x

c

i

;C

j

) =

d

c

X

r=1

w

r

s(x

c

ir

;C

j

) =

d

c

X

r=1

w

r

A

r

=x

c

ir

(C

j

)

A

r

6=N

ULL

(C

j

)

:(8)

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

15/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Calculation

of Categorical Attribute Weights

F

rom the view point of information theory,the importance of any categorical

attribute A

r

can be estimated by

H

A

r

=

1

m

r

m

r

X

t=1

p(a

r

t

) log p(a

rt

) with p(a

rt

) =

A

r

=a

rt

(X)

A

r

6=N

ULL

(X)

;(9)

where a

rt

2 dom(A

r

),X is the whole data set and m

r

is the number of values

can be chosen by A

r

.

The weight of

each attribute is then computed as

w

r

= H

A

r

=

d

c

X

t=1

H

A

t

:(10)

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

16/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Similar

ity Metric for Numerical Attributes (I)

It

is a universal law that the distance and perceived similarity

between numerical vectors are related via an exponential function

as follows:

s(x

A

;x

B

) = exp(Dis(x

A

;x

B

));(11)

where Dis stands for a distance measure.

Moreo

ver,to avoid the inﬂuence of different magnitudes of

distances,we can further use proportional distance instead of

absolute distance.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

17/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Similar

ity Metric for Numerical Attributes (II)

Deﬁnition

2

The

object-cluster similarity between numerical vector x

u

i

and cluster C

j

is

given by

s(x

u

i

;C

j

) = exp

0

B

B

@

Dis(x

u

i

;c

j

)

k

P

t=1

D

is(x

u

i

;c

t

)

1

C

C

A

;(12)

where c

j

is the center of all numerical vectors in cluster C

j

.

In

practice,different distance metrics can be utilized to calculate Dis(x

u

i

;c

j

).

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

18/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Similarity Metric for Mixed Data

Calculation

of Object-cluster Similarity

According

to previous descriptions,the object-cluster similarity metric for

mixed data is given by

s(x

i

;C

j

) =

d

c

d

d

c

X

r=1

0

B

B

B

@

H

A

r

d

c

P

t=1

H

A

t

A

r

=x

c

ir

(C

j

)

A

r

6=N

ULL

(C

j

)

1

C

C

C

A

+

d

u

d

exp

0

B

B

@

D

is(x

u

i

;c

j

)

k

P

t=1

D

is(x

u

i

;c

t

)

1

C

C

A

;

(13)

where i = 1;2;:::;N,j = 1;2;:::;k.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

19/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Outline

1

Introduction

Motiv

ation

Pre

vious Work

Objectiv

e

2

Object-cluster

Similarity Metric

Cluster

ing Task

Similar

ity Metric for Mixed Data

3

Iter

ative Clustering Algorithm

4

Exper

iments

Ev

aluation Criteria

P

erformance on Mixed Data Sets

P

erformance on Categorical Data Sets

5

Conclusion

6

Ac

knowledgment

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

20/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Cluster

ing Criterion

W

e concentrate on hard partition only,i.e.,q

ij

2 f0;1g.

Giv

en a set of N objects,the optimal Q

= fq

ij

g in Eq.(1) can be

given by

q

ij

=

1;if s(x

i

;C

j

) s(x

i

;C

r

);1 r k;

0;otherwise:

(14)

Similar

to the learning procedure of k-means,an iterative

algorithm can be conducted to implement the clustering analysis.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

21/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

OCIL

Algorithm

Iter

ative clustering learning based on object-cluster similarity metric:

Require:data set X = fx

1

;x

2

;:::;x

N

g,number of clusters k

Ensure:cluster label Y = fy

1

;y

2

;:::;y

N

g

1:Calculate the importance of each categorical attribute if applicable

2:Set Y = f0;0;:::;0g and randomly select k initial objects,one for each cluster

3:repeat

4:Initialize noChange = true

5:for i = 1 to N do

6:y

(new)

i

= arg max

j2f1;:::;kg

[s(x

i

;C

j

)]

7:if y

(new)

i

6= y

(old)

i

then

8:noChange = false

9:Update the information of clusters C

y

(new)

i

and C

y

(old)

i

,including the frequency of

each categorical value and the centroid of numerical vectors

10:end if

11:end for

12:until noChange is true

13:return Y

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

22/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Outline

1

Introduction

Motiv

ation

Pre

vious Work

Objectiv

e

2

Object-cluster

Similarity Metric

Cluster

ing Task

Similar

ity Metric for Mixed Data

3

Iter

ative Clustering Algorithm

4

Exper

iments

Ev

aluation Criteria

P

erformance on Mixed Data Sets

P

erformance on Categorical Data Sets

5

Conclusion

6

Ac

knowledgment

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

23/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Evaluation Criteria

Ev

aluation Criteria

Cluster

ing Accuracy (ACC):

ACC =

P

N

i=1

(c

i

;map(r

i

))

N

;

where map(r

i

) maps

the obtained cluster label r

i

to the equivalent

label from the data corpus by using the Kuhn-Munkres algorithm.

Cluster

ing Error Rate:

e = 1 ACC

.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

24/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Performance on Mixed Data Sets

Mix

ed Data Sets

T

able 1:Statistics of mixed data sets

Data

set Instance Attribute (d

c

+d

u

) Class

Statlog

Heart 270 7 + 6 2

Heart Disease 303 7 + 6 2

Credit Approval 653 9 + 6 2

German Credit 1000 13 + 7 2

Dermatology 366 33 + 1 6

Adult 30162 8 + 6 2

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

25/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Performance on Mixed Data Sets

Cluster

ing Errors on Mixed Data Sets

T

able 2:Clustering errors of OCIL on mixed data sets in comparison with

k-prototype and k-means

Data

set K-means K-prototype OCIL

Statlog

0.40470.0071 0.23060.0821 0.17160.0065

Heart 0.42240.0131 0.22800.0903 0.16440.0030

Credit 0.44870.0016 0.26190.0976 0.25190.0966

German 0.32900.0014 0.32890.0006 0.30570.0007

Dermatology 0.70060.0216 0.69030.0255 0.30510.0896

Adult 0.38690.0067 0.38550.0143 0.30790.0305

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

26/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Performance on Mixed Data Sets

Compar

ison of Convergence Rate

T

able 3:Comparison of average convergent time and iterations between

k-prototype and OCIL

Data

set

Time

Iter

ations

K-prototype

OCIL

K-prototype

OCIL

Statlog

0.0519s 0.0516s

3.09 3.07

Hear

t

0.0639s 0.0576s

3.54 3.02

Credit

0.1323s

0.1625s

3.18 4.26

Ger

man

0.2999s 0.2023s

5.29 3.15

Der

matol

0.3674s 0.1888s

7.27 4.32

Adult

15.2795s 9.6774s

10.93 6.78

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

27/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Performance on Categorical Data Sets

Categor

ical Data Sets

T

able 4:Statistics of categorical data sets

Data

set Instance Attribute Class

So

ybean 47 35 4

Breast 699 9 2

Vote 435 16 2

Zoo 101 16 7

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

28/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Performance on Categorical Data Sets

Cluster

ing Errors on Categorical Data Sets

T

able 5:Comparison of clustering errors obtained by three different

methods on categorical data sets

Data

set H’s k-modes N’s k-modes OCIL

So

ybean 0.16910.1521 0.09640.1404 0.10170.1380

Breast 0.16550.1528 0.13560.0016 0.09340.0009

Vote 0.13870.0066 0.13450.0031 0.12130.0010

Zoo 0.28730.1083 0.27300.0818 0.26810.0906

H’

s k-modes:original k-modes algorithm (Huang,SIGMOD’97);

N’s k-modes:k-modes algorithm with Ng’s dissimilarity metric (Ng et al.,TPAMI’07);

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

29/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Outline

1

Introduction

Motiv

ation

Pre

vious Work

Objectiv

e

2

Object-cluster

Similarity Metric

Cluster

ing Task

Similar

ity Metric for Mixed Data

3

Iter

ative Clustering Algorithm

4

Exper

iments

Ev

aluation Criteria

P

erformance on Mixed Data Sets

P

erformance on Categorical Data Sets

5

Conclusion

6

Ac

knowledgment

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

30/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Conclusion

A

general clustering framework based on object-cluster similarity has

been proposed.

A

uniﬁed similarity metric for both categorical and numerical attributes

has been presented.

An

iterative algorithm which is applicable to clustering analysis on

various data types has been introduced.

The

advantages of the proposed method have been experimentally

demonstrated in comparison with the existing counterparts

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

31/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Outline

1

Introduction

Motiv

ation

Pre

vious Work

Objectiv

e

2

Object-cluster

Similarity Metric

Cluster

ing Task

Similar

ity Metric for Mixed Data

3

Iter

ative Clustering Algorithm

4

Exper

iments

Ev

aluation Criteria

P

erformance on Mixed Data Sets

P

erformance on Categorical Data Sets

5

Conclusion

6

Ac

knowledgment

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

32/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Ac

knowledgment

Collabor

ative Graduate Program in Design,Kyoto University;

Depar

tment of Computer Science,Hong Kong Baptist University.

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

33/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Ref

erences

1.

Michalski,R.S.,Bratko,I.,Kubat,M.:Machine learning and data mining:methods and

applications.Wiley,New York (1998)

2.Hsu,C.C.:Generalizing self-organizing map for categorical data.IEEE Transactions on

Neural Networks 17(2) (March 2006) 294–304

3.Li,C.,Biswas,G.:Unsupervised learning with mixed numeric and nominal data.IEEE

Transactions on Knowledge and Data Engineering 14(4)(July/August 2002) 673–690

4.Zaki,M.J.,Peters,M.:Click:Mining subspace clusters in categorical data via k-partite

maximal cliques.In:Proceedings of the 21st International Conference on Data Engineering.

(2005) 355–356

5.Barbara,D.,Couto,J.,Li,Y.:Coolcat:An entropy-based algorithm for categorical clustering.

In:Proceedings of the 11th ACM Conference on Information and Knowledge Management.

(2002) 582–589

6.Huang,Z.:Clustering large data sets with mixed numeric and categorical values.In:

Proceedings of the First Paciﬁc-Asia Conference on Knowledge Discovery and Data Mining.

(1997) 21–24

7.Huang,Z.:A fast clustering algorithm to cluster very large categorical data sets in data

mining.In:Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and

Know ledge Discovery.(1997) 1–8

8.Ng,M.K.,Li,M.J.,Huang,J.Z.,He,Z.:On the impact of dissimilarity measure in k-modes

clustering algorithm.IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3)

(2007) 503–507

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

34/35

Introduction

Object-cluster

Similarity Metric

Iter

ative Clustering Algorithm

Exper

iments Conclusion Acknowledgment

Thank You!

Y

iu-ming Cheung and Hong Jia (HKBU)

Uniﬁed

Metric for Mixed Data Clustering

2013

35/35

## Comments 0

Log in to post a comment