# Clustering Algorithms

AI and Robotics

Nov 24, 2013 (4 years and 6 months ago)

92 views

Clustering Algorithms
Johannes Blomer
WS 2012/13
Introduction
Clustering techniques for data management and analysis that
classify/group given set of objects into
categories/subgroups or clusters
Clusters homogeneous subgroups of objects such that
similarity b/w objects in one subgroup is larger than
similarity b/w objects from dierent subgroups
Goals
1.nd structures in large set of objects/data
2.simplify large data sets
Example
Example
How do we measure similarity/dissimilarity of objects?
How do we measure quality of clustering?
Application areas
1.information retrieval
2.data mining
3.computer graphics
4.data compression
5.bioinformatics
6.machine learning
7.statistics
8.pattern recognition.
Goals of this course
I
dierent models for clustering
I
many important clustering heuristics,including agglomerative
clustering,Lloyd's algorithm,and the EM algorithms
I
the limitations of these heuristics
I
improvements to these heuristics
I
NP-hardness results and approximation algorithms
I
general techniques to improve the eciency of heuristics and
approximation algorithms,i.e.dimension reduction techniques.
Organization
bloemer/lehre/2012/ws/clusteringalgorithms.html
Here you nd
I
announcements
I
handouts
I
slides
I
literature
I
lecture notes (will be written and appear as course progresses)
Prerequisites
I
design and analysis of algorithm
I
basic complexity theory
I
probability theory and stochastic
I
some linear algebra
Tutorials
There are TWO Tutorials:
I
Thursday,1 -2 p.m.,room F2.211 new
I
Friday,1-2 p.m.,room,F1.110
Objects
I
objects described by d dierent features
I
features continuous or binary
I
objects described as elements in R
d
or f0;1g
d
I
objects from M  R
d
or M  f0;1g
d
Distance functions
Denition 1.1
D:MM!R is called a distance function,if for all x;y;z 2 M
I
D(x;y) = D(y;x) (symmetry)
I
D(x;y)  0 (positivity),
D is called a metric,if in addition,
I
D(x;y) = 0,x = y (re exivity)
I
D(x;z)  D(x;y) +D(y;z) (triangle inequality)
Examples
Example 1.2 (euclidean distance)
M = R
d
;
D
l
2
(x;y) = kx yk
2
=

d
X
i =1
jx
i
y
i
j
2

1
2
;
where x = (x
1
;:::;x
d
) and y = (y
1
;:::;y
d
):
Examples
Example 1.3 (Minkowski distances,l
p
-norms)
M = R
d
;p  1;
D
l
p
(x;y) = kx yk
p
=

d
X
i =1
jx
i
y
i
j
p
1
p
:
Example 1.4 (maximum distance)
M = R
d
;
D
l
1
(x;y) = kx yk
1
= max
1i d
jx
i
y
i
j:
Examples
Example 1.5 (Pearson correlation)
M = R
d
;
D
Pearson
(x;y) =
1
2
0
@
1 
P
d
i =1
(x
i
x)(y
i
y)
q
P
d
i =1
(x
i
x)
2
P
d
i =1
(y
i
y)
2
1
A
;
where x =
1
d
P
x
i
and y =
1
d
P
y
i
:
Examples
Example 1.6 (Mahalanobis divergence)
A 2 R
dd
positive denite,i.e.x
T
Ax > 0 for x 6= 0;M = R
d
;
D
A
(x;y) = (x y)
T
A(x y)
Example 1.7 (Itakura-Saito divergence)
M = R
d
0
;
D
IS
(x;y) =
X
x
i
y
i
ln(
x
i
y
i
) 1;
where by denition 0  ln(0) = 0.
Examples
Example 1.8 (Kullback-Leibler divergence)
M = S
d
:= fx 2 R
d
:8i:x
i
 0;
P
x
i
= 1g;
D
KLD
(x;y) =
X
x
i
ln(x
i
=y
i
);
where by denition 0  ln(0) = 0.
Example 1.9 (generalized KLD)
M = R
d
0
;
D
KLD
(x;y) =
X
x
i
ln(x
i
=y
i
) (x
i
y
i
);
Similarity functions
Denition 1.10
S:MM!R is called a similarity function,if for all x;y;z 2 M
I
S(x;y) = S(y;x) (symmetry)
I
0  S(x;y)  1 (positivity),
S is called a metric,if in addition,
I
S(x;y) = 1,x = y (re exivity)
I
S(x;y)S(y;z) 

S(x;y) +S(y;z)

S(x;z) (triangle
inequality)
Examples
Example 1.11 (Cosine similarity)
M = R
d
;
S
CS
(x;y) =
x
T
y
kxkkyk
:
Similarity for binary features
Let x;y 2 f0;1g
d
,then
n
b

b
(x;y):=

f1  i  d:x
i
= b;y
i
=

bg

and for w 2 R
0
S
w
(x;y):=
n
00
(x;y) +n
11
(x;y)
n
00
(x;y) +n
11
(x;y) +w

n
01
(x;y) +n
10
(x;y)

:
Popular:w = 1;2;
1
2
.
Example 1.12 (matching coecient)
w = 1;S
mc
(x;y) =
n
00
(x;y) +n
11
(x;y)
d
:
Similarity for binary features

S
w
(x;y):=
n
11
(x;y)
n
11
(x;y) +w

n
01
(x;y) +n
10
(x;y)

Popular:w = 1;2;
1
2
.
Example 1.13 (Jaccard coecient)
w = 1;S
mc
(x;y) =
n
11
(x;y)
n
11
(x;y) +n
01
(x;y) +n
10
(x;y)
: