Local versus Global Interactions in Clustering Algorithms

coachkentuckyAI and Robotics

Nov 25, 2013 (3 years and 9 months ago)

66 views

1

Local versus Global Interactions in
Clustering Algorithms

Computer Engineering Department

21/03/2010

Wesam M. Ashour

2

Outline




Clustering?


-

K
-
means Clustering Algorithm


New algorithms

-

Weighted K
-
means (WKM)

-

Inverse Weighted K
-
means (IWKM)


Topology
-
Preserving mappings


-

Generative Topographic Mapping (GTM)


-

Inverse
-
Weighted K
-
means Topology
-
Preserving Map (IKToM)

Computer Engineering Department

21/03/2010

3



Cluster: a collection of data objects


Objects are similar to objects in same cluster


Objects are dissimilar to objects in other clusters



Cluster analysis


Finding

groups

of

objects

such

that

the

objects

in

a

group

will

be

similar

(or

related)

to

one

another

and

different

from

(or

unrelated

to)

the

objects

in

other

groups



Clustering is unsupervised learning: no predefined classes



Clustering

?

Computer Engineering Department

21/03/2010

4



Partitioning Algorithms



Hierarchical Algorithms



Density based Algorithms



Grid based Algorithms



Graph based Algorithms



Model based Algorithms

Clustering

?

Computer Engineering Department

21/03/2010

5



Pattern Recognition



Compression



Web documents



Biology



Marketing

Clustering

?

Computer Engineering Department

21
/
03
/
2010

6

Background

K
-
means


The

algorithm

tries

to

locate

K

prototypes

throughout

a

data

set

in

such

a

way

that

the

K

prototypes

in

some

way

best

represent

the

data
.

Disadvantage

Specify

the

number

of

clusters

in

advance

Sensitivity

to

prototypes

initialization

Dead

Prototypes

Converge

to

local

optimum

Computer Engineering Department

21
/
03
/
2010

7

Weighted K
-
Means (WKM)


The Performance function for K
-
means may be written as







N
i
j
i
K
j
m
x
perf
1
2
1
min
(
1
)


Optimization

0
0
0
0
0
0
3
2
2
,
3
2
1
1
,
2
1
1
,
1
1
2
,
3
1
,
2
1
,
1
























m
Perf
m
d
m
Perf
m
d
m
d
m
Perf
d
d
d
perf
x
3

x
2

x
1

m
1

m
2

m
3

Computer Engineering Department

21
/
03
/
2010

8

Weighted K
-
Means (cont.)


Consider the following performance function:








N
i
K
j
j
i
m
x
perf
1
1
2
(
2
)


Optimization

x
3

x
2

x
1

m
1

m
2

m
3

















N
i
k
k
k
x
N
m
m
Perf
d
d
d
d
d
d
d
d
d
perf
1
3
,
3
2
,
3
1
,
3
3
,
2
2
,
2
1
,
2
3
,
1
2
,
1
1
,
1
,
1
0
Computer Engineering Department

21
/
03
/
2010

9

Weighted K
-
Means (cont.)

2
1
1
1
min
l
i
K
l
N
i
K
j
j
i
m
x
m
x
perf














(
3
)


We wish to form a performance function with following
properties:


Minimum performance gives good clustering


Creates a relationship between all data points and all
prototypes

Computer Engineering Department

21
/
03
/
2010

10

Weighted K
-
Means (cont.)

Batch Mode

All data points come together

0
)
(
1








N
i
k
i
k
m
x
perf
m
perf
Computer Engineering Department

21
/
03
/
2010

11

Let m
r

be the closest prototype to x
i
, then



Optimization: generate two sets of updates

(
4
)

Weighted K
-
Means (cont.)

2
1
1
)
(
r
i
N
i
K
j
j
i
i
m
x
m
x
x
perf














ij
j
i
j
i
r
i
j
i
j
i
ir
r
i
K
j
j
i
r
i
r
i
r
i
b
m
x
m
x
m
x
m
x
m
x
perf
a
m
x
m
x
m
x
m
x
m
x
perf
)
(
)
(
)
(
)
(
2
)
(
)
(
2
1




























































k
j
V
i
ik
V
i
ik
k
j
V
i
ik
i
V
i
ik
i
k
N
i
k
i
k
j
k
j
k
b
a
b
x
a
x
m
m
x
perf
m
perf
,
,
1
0
)
(
Where
V
k

is the index of data points that are closest to
m
k

and
V
j

is the index of the other points

(
5
)

Batch Mode

Computer Engineering Department

21
/
03
/
2010

12

Weighted K
-
Means (cont.)


Problem which needs to be solved!

2
1
1
1
min
l
i
K
l
N
i
K
j
j
i
m
x
m
x
perf














(
7
)

Computer Engineering Department

21
/
03
/
2010

13

Inverse
-
Weighted K
-
Means (IWKM)

n
l
i
K
l
N
i
K
j
p
j
i
m
x
m
x
perf
















1
1
1
min
1
(
10
)

Optimization

Batch Mode

Find the partial derivative of the performance with respect to
mk, assign to zero and then solve for mk




2
1
2
2
,
,
*
*
*
*




































p
k
i
n
r
i
ik
M
j
p
j
i
n
k
i
p
n
k
i
ik
k
j
V
i
ik
V
i
ik
k
j
V
i
ik
i
V
i
ik
i
k
m
x
m
x
p
b
m
x
m
x
n
m
x
p
a
b
a
b
x
a
x
m
j
k
j
k
(
11
)

Computer Engineering Department

21
/
03
/
2010

14

Simulation

Example
1

IWKM

K
-
means

Example
2

K
-
means

IWKM

Computer Engineering Department

21
/
03
/
2010

15

Simulation

Example
3

K
-
means

IWKM

Computer Engineering Department

21
/
03
/
2010

16

Example
4

: IWKM

Simulation

Example
5

KHMO

IWKM

Computer Engineering Department

21
/
03
/
2010

17

Inverse
-
weighted K
-
means Topology
-
Preserving Map (IKToM)


Has the same structure as GTM



K latent points in a latent space with some structure



Mapped through M basis functions to feature space



Then mapped to data space to K points using weights W,
m
k
=
Φ
k
W



Use
IWKM

to find m
k

Computer Engineering Department

21
/
03
/
2010

18

Simulation

Example
1
: Genes data set (
40
samples,
3036
dimensions,
3
types)

Example
2
: Algae data set (
72
samples,
18
dimensions,
9
types)

Example
3
: Glass data set (
218
samples,
10
dimensions,
6
types

Computer Engineering Department

21
/
03
/
2010

19

Conclusion


Solves the problem of sensitivity to initial conditions in
K
-
means


Provides two sets of updates


Works well in high dimensional data sets


Can be extended for visualization



WKM and IWKM


Extension of IWKM


Has the same structure as GTM



Visualization

Computer Engineering Department

21
/
03
/
2010

20

Thank

Any please

You

?

question

Computer Engineering Department

21
/
03
/
2010