# Local versus Global Interactions in Clustering Algorithms

Τεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 4 χρόνια και 5 μήνες)

82 εμφανίσεις

1

Local versus Global Interactions in
Clustering Algorithms

Computer Engineering Department

21/03/2010

Wesam M. Ashour

2

Outline

Clustering?

-

K
-
means Clustering Algorithm

New algorithms

-

Weighted K
-
means (WKM)

-

Inverse Weighted K
-
means (IWKM)

Topology
-
Preserving mappings

-

Generative Topographic Mapping (GTM)

-

Inverse
-
Weighted K
-
means Topology
-
Preserving Map (IKToM)

Computer Engineering Department

21/03/2010

3

Cluster: a collection of data objects

Objects are similar to objects in same cluster

Objects are dissimilar to objects in other clusters

Cluster analysis

Finding

groups

of

objects

such

that

the

objects

in

a

group

will

be

similar

(or

related)

to

one

another

and

different

from

(or

unrelated

to)

the

objects

in

other

groups

Clustering is unsupervised learning: no predefined classes

Clustering

?

Computer Engineering Department

21/03/2010

4

Partitioning Algorithms

Hierarchical Algorithms

Density based Algorithms

Grid based Algorithms

Graph based Algorithms

Model based Algorithms

Clustering

?

Computer Engineering Department

21/03/2010

5

Pattern Recognition

Compression

Web documents

Biology

Marketing

Clustering

?

Computer Engineering Department

21
/
03
/
2010

6

Background

K
-
means

The

algorithm

tries

to

locate

K

prototypes

throughout

a

data

set

in

such

a

way

that

the

K

prototypes

in

some

way

best

represent

the

data
.

Specify

the

number

of

clusters

in

Sensitivity

to

prototypes

initialization

Prototypes

Converge

to

local

optimum

Computer Engineering Department

21
/
03
/
2010

7

Weighted K
-
Means (WKM)

The Performance function for K
-
means may be written as

N
i
j
i
K
j
m
x
perf
1
2
1
min
(
1
)

Optimization

0
0
0
0
0
0
3
2
2
,
3
2
1
1
,
2
1
1
,
1
1
2
,
3
1
,
2
1
,
1

m
Perf
m
d
m
Perf
m
d
m
d
m
Perf
d
d
d
perf
x
3

x
2

x
1

m
1

m
2

m
3

Computer Engineering Department

21
/
03
/
2010

8

Weighted K
-
Means (cont.)

Consider the following performance function:

N
i
K
j
j
i
m
x
perf
1
1
2
(
2
)

Optimization

x
3

x
2

x
1

m
1

m
2

m
3

N
i
k
k
k
x
N
m
m
Perf
d
d
d
d
d
d
d
d
d
perf
1
3
,
3
2
,
3
1
,
3
3
,
2
2
,
2
1
,
2
3
,
1
2
,
1
1
,
1
,
1
0
Computer Engineering Department

21
/
03
/
2010

9

Weighted K
-
Means (cont.)

2
1
1
1
min
l
i
K
l
N
i
K
j
j
i
m
x
m
x
perf

(
3
)

We wish to form a performance function with following
properties:

Minimum performance gives good clustering

Creates a relationship between all data points and all
prototypes

Computer Engineering Department

21
/
03
/
2010

10

Weighted K
-
Means (cont.)

Batch Mode

All data points come together

0
)
(
1

N
i
k
i
k
m
x
perf
m
perf
Computer Engineering Department

21
/
03
/
2010

11

Let m
r

be the closest prototype to x
i
, then

Optimization: generate two sets of updates

(
4
)

Weighted K
-
Means (cont.)

2
1
1
)
(
r
i
N
i
K
j
j
i
i
m
x
m
x
x
perf

ij
j
i
j
i
r
i
j
i
j
i
ir
r
i
K
j
j
i
r
i
r
i
r
i
b
m
x
m
x
m
x
m
x
m
x
perf
a
m
x
m
x
m
x
m
x
m
x
perf
)
(
)
(
)
(
)
(
2
)
(
)
(
2
1

k
j
V
i
ik
V
i
ik
k
j
V
i
ik
i
V
i
ik
i
k
N
i
k
i
k
j
k
j
k
b
a
b
x
a
x
m
m
x
perf
m
perf
,
,
1
0
)
(
Where
V
k

is the index of data points that are closest to
m
k

and
V
j

is the index of the other points

(
5
)

Batch Mode

Computer Engineering Department

21
/
03
/
2010

12

Weighted K
-
Means (cont.)

Problem which needs to be solved!

2
1
1
1
min
l
i
K
l
N
i
K
j
j
i
m
x
m
x
perf

(
7
)

Computer Engineering Department

21
/
03
/
2010

13

Inverse
-
Weighted K
-
Means (IWKM)

n
l
i
K
l
N
i
K
j
p
j
i
m
x
m
x
perf

1
1
1
min
1
(
10
)

Optimization

Batch Mode

Find the partial derivative of the performance with respect to
mk, assign to zero and then solve for mk

2
1
2
2
,
,
*
*
*
*

p
k
i
n
r
i
ik
M
j
p
j
i
n
k
i
p
n
k
i
ik
k
j
V
i
ik
V
i
ik
k
j
V
i
ik
i
V
i
ik
i
k
m
x
m
x
p
b
m
x
m
x
n
m
x
p
a
b
a
b
x
a
x
m
j
k
j
k
(
11
)

Computer Engineering Department

21
/
03
/
2010

14

Simulation

Example
1

IWKM

K
-
means

Example
2

K
-
means

IWKM

Computer Engineering Department

21
/
03
/
2010

15

Simulation

Example
3

K
-
means

IWKM

Computer Engineering Department

21
/
03
/
2010

16

Example
4

: IWKM

Simulation

Example
5

KHMO

IWKM

Computer Engineering Department

21
/
03
/
2010

17

Inverse
-
weighted K
-
means Topology
-
Preserving Map (IKToM)

Has the same structure as GTM

K latent points in a latent space with some structure

Mapped through M basis functions to feature space

Then mapped to data space to K points using weights W,
m
k
=
Φ
k
W

Use
IWKM

to find m
k

Computer Engineering Department

21
/
03
/
2010

18

Simulation

Example
1
: Genes data set (
40
samples,
3036
dimensions,
3
types)

Example
2
: Algae data set (
72
samples,
18
dimensions,
9
types)

Example
3
: Glass data set (
218
samples,
10
dimensions,
6
types

Computer Engineering Department

21
/
03
/
2010

19

Conclusion

Solves the problem of sensitivity to initial conditions in
K
-
means

Works well in high dimensional data sets

Can be extended for visualization

WKM and IWKM

Extension of IWKM

Has the same structure as GTM

Visualization

Computer Engineering Department

21
/
03
/
2010

20

Thank

You

?

question

Computer Engineering Department

21
/
03
/
2010