1
Local versus Global Interactions in
Clustering Algorithms
Computer Engineering Department
21/03/2010
Wesam M. Ashour
2
Outline
Clustering?
-
K
-
means Clustering Algorithm
New algorithms
-
Weighted K
-
means (WKM)
-
Inverse Weighted K
-
means (IWKM)
Topology
-
Preserving mappings
-
Generative Topographic Mapping (GTM)
-
Inverse
-
Weighted K
-
means Topology
-
Preserving Map (IKToM)
Computer Engineering Department
21/03/2010
3
Cluster: a collection of data objects
Objects are similar to objects in same cluster
Objects are dissimilar to objects in other clusters
Cluster analysis
Finding
groups
of
objects
such
that
the
objects
in
a
group
will
be
similar
(or
related)
to
one
another
and
different
from
(or
unrelated
to)
the
objects
in
other
groups
Clustering is unsupervised learning: no predefined classes
Clustering
?
Computer Engineering Department
21/03/2010
4
Partitioning Algorithms
Hierarchical Algorithms
Density based Algorithms
Grid based Algorithms
Graph based Algorithms
Model based Algorithms
Clustering
?
Computer Engineering Department
21/03/2010
5
Pattern Recognition
Compression
Web documents
Biology
Marketing
Clustering
?
Computer Engineering Department
21
/
03
/
2010
6
Background
K
-
means
The
algorithm
tries
to
locate
K
prototypes
throughout
a
data
set
in
such
a
way
that
the
K
prototypes
in
some
way
best
represent
the
data
.
Disadvantage
Specify
the
number
of
clusters
in
advance
Sensitivity
to
prototypes
initialization
Dead
Prototypes
Converge
to
local
optimum
Computer Engineering Department
21
/
03
/
2010
7
Weighted K
-
Means (WKM)
•
The Performance function for K
-
means may be written as
N
i
j
i
K
j
m
x
perf
1
2
1
min
(
1
)
•
Optimization
0
0
0
0
0
0
3
2
2
,
3
2
1
1
,
2
1
1
,
1
1
2
,
3
1
,
2
1
,
1
m
Perf
m
d
m
Perf
m
d
m
d
m
Perf
d
d
d
perf
x
3
x
2
x
1
m
1
m
2
m
3
Computer Engineering Department
21
/
03
/
2010
8
Weighted K
-
Means (cont.)
•
Consider the following performance function:
N
i
K
j
j
i
m
x
perf
1
1
2
(
2
)
•
Optimization
x
3
x
2
x
1
m
1
m
2
m
3
N
i
k
k
k
x
N
m
m
Perf
d
d
d
d
d
d
d
d
d
perf
1
3
,
3
2
,
3
1
,
3
3
,
2
2
,
2
1
,
2
3
,
1
2
,
1
1
,
1
,
1
0
Computer Engineering Department
21
/
03
/
2010
9
Weighted K
-
Means (cont.)
2
1
1
1
min
l
i
K
l
N
i
K
j
j
i
m
x
m
x
perf
(
3
)
We wish to form a performance function with following
properties:
•
Minimum performance gives good clustering
•
Creates a relationship between all data points and all
prototypes
Computer Engineering Department
21
/
03
/
2010
10
Weighted K
-
Means (cont.)
Batch Mode
All data points come together
0
)
(
1
N
i
k
i
k
m
x
perf
m
perf
Computer Engineering Department
21
/
03
/
2010
11
Let m
r
be the closest prototype to x
i
, then
•
Optimization: generate two sets of updates
(
4
)
Weighted K
-
Means (cont.)
2
1
1
)
(
r
i
N
i
K
j
j
i
i
m
x
m
x
x
perf
ij
j
i
j
i
r
i
j
i
j
i
ir
r
i
K
j
j
i
r
i
r
i
r
i
b
m
x
m
x
m
x
m
x
m
x
perf
a
m
x
m
x
m
x
m
x
m
x
perf
)
(
)
(
)
(
)
(
2
)
(
)
(
2
1
k
j
V
i
ik
V
i
ik
k
j
V
i
ik
i
V
i
ik
i
k
N
i
k
i
k
j
k
j
k
b
a
b
x
a
x
m
m
x
perf
m
perf
,
,
1
0
)
(
Where
V
k
is the index of data points that are closest to
m
k
and
V
j
is the index of the other points
(
5
)
Batch Mode
Computer Engineering Department
21
/
03
/
2010
12
Weighted K
-
Means (cont.)
•
Problem which needs to be solved!
2
1
1
1
min
l
i
K
l
N
i
K
j
j
i
m
x
m
x
perf
(
7
)
Computer Engineering Department
21
/
03
/
2010
13
Inverse
-
Weighted K
-
Means (IWKM)
n
l
i
K
l
N
i
K
j
p
j
i
m
x
m
x
perf
1
1
1
min
1
(
10
)
Optimization
Batch Mode
Find the partial derivative of the performance with respect to
mk, assign to zero and then solve for mk
2
1
2
2
,
,
*
*
*
*
p
k
i
n
r
i
ik
M
j
p
j
i
n
k
i
p
n
k
i
ik
k
j
V
i
ik
V
i
ik
k
j
V
i
ik
i
V
i
ik
i
k
m
x
m
x
p
b
m
x
m
x
n
m
x
p
a
b
a
b
x
a
x
m
j
k
j
k
(
11
)
Computer Engineering Department
21
/
03
/
2010
14
Simulation
Example
1
IWKM
K
-
means
Example
2
K
-
means
IWKM
Computer Engineering Department
21
/
03
/
2010
15
Simulation
Example
3
K
-
means
IWKM
Computer Engineering Department
21
/
03
/
2010
16
Example
4
: IWKM
Simulation
Example
5
KHMO
IWKM
Computer Engineering Department
21
/
03
/
2010
17
Inverse
-
weighted K
-
means Topology
-
Preserving Map (IKToM)
•
Has the same structure as GTM
•
K latent points in a latent space with some structure
•
Mapped through M basis functions to feature space
•
Then mapped to data space to K points using weights W,
m
k
=
Φ
k
W
•
Use
IWKM
to find m
k
Computer Engineering Department
21
/
03
/
2010
18
Simulation
Example
1
: Genes data set (
40
samples,
3036
dimensions,
3
types)
Example
2
: Algae data set (
72
samples,
18
dimensions,
9
types)
Example
3
: Glass data set (
218
samples,
10
dimensions,
6
types
Computer Engineering Department
21
/
03
/
2010
19
Conclusion
•
Solves the problem of sensitivity to initial conditions in
K
-
means
•
Provides two sets of updates
•
Works well in high dimensional data sets
•
Can be extended for visualization
WKM and IWKM
•
Extension of IWKM
•
Has the same structure as GTM
Visualization
Computer Engineering Department
21
/
03
/
2010
20
Thank
Any please
You
?
question
Computer Engineering Department
21
/
03
/
2010
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο