# Lecture 9 - Ryan A. Rossi

Τεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 4 χρόνια και 8 μήνες)

94 εμφανίσεις

Main Clustering Algorithms

K
-
Means

Hierarchical

SOM

K
-
Means

MacQueen, 1967

clusters defined by means/centroids

Many clustering algorithms are derivatives of K
-
Means

it’s many problems

K
-
Means Example

Hierarchical Clustering

Starts by assuming each point as a cluster

Iteratively links most similar pair of clusters

User
-
defined threshold parameter specifies
the output clusters

Hierarchical Clustering Variants
In Minitab

Average

Centroid

Complete

McQuitty

Median

Single

Ward

Distance Measures

Euclidean

Manhattan

Pearson

Squared Euclidean

Squared Pearson

Hierarchical Clustering Example

Results

Still There are Problems

Clustering Documents

“bag of words”

D
i
: vector of length l

Distance between D
i

and D
j
: <D
i
, D
j
>

W
1

W
2

W
3

W
i

W
j

W
n

f
11

f
21

f
31

f
i1

f
j1

f
n1

. . . .

. . .

. . . . . . . . .

. . . . . . . . .

. . .

. . . .

D
1
:

f
12

f
22

f
32

f
i2

f
j2

f
n2

. . . . . . . . .

. . .

. . . .

D
2
:

D
m
:

f
1m

f
2m

f
3m

f
im

f
jm

f
nm

. . . . . . . . .

. . .

. . . .

M

Cluster Centroid

Cluster defined by distance to centroid: C

C = 1/m
S
D
i
,
where m is
the # of vectors

Elevations

Elevation of D: El(D) = <C, D>

Problem: Would like:

Mapping to higher Dimension

Utilizing Kernel Function K(X,Y)

K(X,Y) = <
F
(X),
F
(Y)>,

where, X,Y are vectors in R
n
, and
F
is a mapping into R
d
, d
>> n

Key element in Support Vector Machines

Data needs to appear as Dot Product only: <D
i
,D
j
>

Kernel Function Examples

Polynomial:

K(X, Y) = (<X, Y> + 1)
n

Feedforward Neural Network Classifier

K(X, Y) = tanh(
β
<X, Y>

+ b)

K(X, Y) = e
-
<X,

Y>^2/2
s
^2

First Step: Penalizing Outliers

C
k

= 1/m
S(
<D
i
,
N
(C
k
-
1
)>D
i
) (1)

Convergence:

C

= Principal Eigenvector of M
T
M,where M is the

matrix of D
i
’s

C

=
lim
L

(M
T
M)
L
U

(2)

Both (1) and (2) are efficient methods of computing
C

Cannot with:

F
k

= 1/m
S(
<
F
(D
i
),
N
(F
k
-
1
)>
F
(D
i
))

Or by using (2):

M =

M
T
M has unmanageable

(eventually infinite) dimension

a
i
k

=
<
F
(D
i
),
N
(F
k
-
1
)> =

(1/m)
S(a
j
k
-
1
<F(
D
i
),
F(
D
j
)>)

(3)

F
(D1)

F
(D2)

.

.

Using Kernels to replace
F

Theorem

F =

Sa
i
*
F(
D
i
) ,

a
i
*
= lim {
a
i
n
=(1/m)
Sa
j
n
-
1
K
(D
i
, D
j
)}

El(D): Elevation of vector D =
Sa
i
*
K
(D
i
, D)

where

for n

Zoomed Clusters

Clusters defined through peaks

Peaks: all vectors, which are the highest in their vicinity:

PEAKS = {D
j
|
El(D
j
)
 (
El(D
i
)<D
i
,D
j
>
S
) for all i}

S: Sharpening/Smoothing Parameter

Cluster: Set of vectors, which are in the vicinity of a
peak

1

2

3

0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
C
1
C
2
K
e
r
n
e
l
:

L
i
n
e
a
r
S
:

D
e
f
a
u
l
t

(
1
)
Clustering Example

Zooming Example

0.0
0
0.5
1.0
1
2
3
0.0
0.5
1.0
1.5
0.0
0
.
0
0
.
1
0
.
5
0
.
2
1
.
4
1
.
2
1
.
0
C
2
1
.
0
0
.
3
0
.
8
0
.
6
0
.
4
C
1
0
.
2
C
3
0
.
4
0
.
0
0
.
5
0
.
6
1

2

3

4

5

6

7

8

K
e
r
n
e
l
:

L
i
n
e
a
r
S
:

D
e
f
a
u
l
t

(
1
)
0
.
0
0
.
1
0
.
5
0
.
2
1
.
4
1
.
2
1
.
0
C
2
1
.
0
0
.
3
0
.
8
0
.
6
0
.
4
C
1
0
.
2
C
3
0
.
4
0
.
0
0
.
5
0
.
6
1

2

3

4

5

6

7

8

9

1
0

1
1

1
2

1
3

1
4

1
5

1
6

K
e
r
n
e
l
:

P
o
l
y
n
o
m
i
a
l

D
e
g
r
e
e

2
S
:

1
6
Zoomed Clusters Results

0
.
0
0
.
1
0
.
5
0
.
2
1
.
4
1
.
2
1
.
0
C
2
1
.
0
0
.
3
0
.
8
0
.
6
0
.
4
C
1
0
.
2
C
3
0
.
4
0
.
0
0
.
5
0
.
6
1

2

3

4

K
e
r
n
e
l
:

P
o
l
y
n
o
m
i
a
l

D
e
g
r
e
e

8
0
0
0
S
:

1
.
5
0
.
0
0
.
1
0
.
5
0
.
2
1
.
4
1
.
2
1
.
0
C
2
1
.
0
0
.
3
0
.
8
0
.
6
0
.
4
C
1
0
.
2
C
3
0
.
4
0
.
0
0
.
5
0
.
6
1

2

K
e
r
n
e
l
:

P
o
l
y
n
o
m
i
a
l

D
e
g
r
e
e

8
0
0
0
S
:

D
e
a
f
a
u
l
t

(
1
)
Default

Genes

Experiments

Clustering MicroArray Data

Expression
Level of
Gene i during
Experiment j

MicroArrays As Time Series

Clustering Time Series

Reveals groups of genes, which
have similar reactions to
experiments

Functionally related genes should
cluster

Simulated Time Series

Simulated 180 Time Series, with 3 clusters and 9
sub
-
clusters (20 per sub
-
cluster)

Each time series is a vector with 1000 components

Each component is expression level at a given time

Results

Kernel: Polynomial Degree 3

S: 6

Kernel: Polynomial Degree 3

S: 7

Kernel: Polynomial Degree 6

S: 15

HMM Parameter Estimation

Viterbi
Algorithm

Refinement of
HMM Model

Final
HMM
Model

Sequential
K
-
Means

Baum
-
Welch
Algorithm

Final
HMM
Model

Refinement of
HMM Model

Initial
HMM
Model

Parameter Estimation with
Zoomed Clusters

Zoomed
Clusters

Initial
HMM
Model

Flexibility with number of states

Initial Model is closer to the final one

Consequences:

Higher accuracy and faster convergence for either
Baum
-
Welch or Viterbi

Example: Coins

HHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTT

HHHHH
TTTTTTT
HHHHHHH
THTHTHTHTH
TTTTTTTT

Coin 1:

Coin 1:
100% Tails

Coin 3:
50% Tails

Regions with similar statistical distribution of Heads and Tails
represent the states in the initial HMM Model

Use Elevation Functions, separately for Heads and Tails to
represent these distributions

HHHHH

HHHHHHH H H H H H

TTTTTTT T T T T T TTTTTTTT

Step 1: Separating Letters

Step 2: Calculating Elevation
Function for each letter

Step 3: For each position in the
sequence of throws …

Position
i

Step 3: Get the Elevation

Step 3: Create point D
i
in R
2
,
whose components are the
elevations

Step 4: Cluster all the points
obtained from each position

Point D
i

= [E
h
, E
t
]

What Clustering Achieves

Each cluster defines regions of similar

Each Cluster is a state in the initial HMM
model

State transition/emission probabilities, are
estimated from the clusters

References

MacQueen, J. 1967. Some methods for classification and analysis of
multivariate observations. Pp. 281
-
297
in
: L. M. Le Cam & J. Neyman [eds.]
Proceedings of the fifth Berkeley symposium on mathematical statistics and
probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p.

Jain, A. K., Murty, M. N., and Flynn, P. J. Data Clustering: A Review.
ACM
Computing Surveys, Vol. 31, No. 3, September 1999

http://www.gene
-
chips.com/ by
Leming Shi, Ph.D.