Clustering

overratedbeltAI and Robotics

Nov 25, 2013 (3 years and 6 months ago)

84 views

Clustering

Usman Roshan

CS 698

Clustering


Suppose we want to cluster
n

vectors in
R
d
into two groups. Define
C
1

and
C
2
as
the two groups.


Our objective is to find
C
1

and
C
2

that
minimize



where
m
i

is the mean of class
C
i

 
||
x
j

m
i
||
2
x
j

C
i

i

1
2

Clustering


NP hard even for 2
-
means





NP hard even on plane




K
-
means heuristic


Popular and hard to beat


Introduced in 1950s and 1960s


K
-
means algorithm for two clusters

Input:

Algorithm:

1.
Initialize: assign
x
i

to
C
1

or
C
2

with equal probability and compute
means:



2.
Recompute clusters: assign
x
i

to
C
1

if
||x
i
-
m
1
||<||x
i
-
m
2
||
, otherwise
assign to
C
2

3.
Recompute means
m
1

and
m
2

4.
Compute objective




5.
Compute objective of new clustering. If difference is smaller than
then stop, otherwise go to step 2.


 
x
i

R
d
,
i

1

n
 
m
1

1
C
1
x
i
x
i

C
1

 
m
2

1
C
2
x
i
x
i

C
2

 
||
x
j

m
i
||
2
x
j

C
i

i

1
2

 

K
-
means


Is it guaranteed to find the clustering
which optimizes the objective?


It is guaranteed to find a local optimal


We can prove that the objective
decreases with subsequence iterations

Proof sketch of convergence
of k
-
means

 
||
x
j

m
i
||
2
x
j

C
i

i

1
2


||
x
j

m
i
||
2
x
j

C
i
*

i

1
2


||
x
j

m
i
*
||
2
x
j

C
i
*

i

1
2

Justification of first inequality: by
assigning
x
j

to the closest mean the
objective decreases or stays the
same

Justification of second inequality:
for a given cluster its mean
minimizes squared error loss

Other clustering algorithms


Hierarchical clustering


Initialize n clusters where each datapoint is
in its own cluster


Merge two nearest clusters into one


Update distances of new cluster to existing
ones


Repeat step 2 until k clusters are formed.

Other clustering algorithms


Graph clustering (Spectral clustering)


Find cut of minimum cost in a bipartition of
the data


NP
-
hard if look for minimum cut such that
size of two clusters are close


Relaxation leads to spectral clustering


Based on calculating Laplacian of a graph
and eigenvalues and eigenvectors of
similarity matrix