Chapter 12 Clustering, Distance Method, and Ordination

Τεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 4 χρόνια και 4 μήνες)

112 εμφανίσεις

1

1

Chapter 12 Clustering, Distance Method, and
Ordination

12.2 Similarity Measures

Commonly used distance:

Euclidean Distance:

y
x
y
x
y
x
y
x
y
x
y
x
d
t
p
p

2
2
2
2
2
1
1
,

Statistical

Distance:

y
x
S
y
x
y
x
d
t

1
,
,

where S is the sample variance
-
covariance matrix.

Minkowski

Distance:

m
p
i
m
i
i
y
x
y
x
d
1
1
,

.

Canberra Metric
:

p
i
i
i
i
i
y
x
y
x
y
x
d
1
,
.

Czekanowski coefficient
:

p
i
i
i
p
i
i
i
y
x
y
x
y
x
d
1
1
,
min
2
1
,
.

12.3

Hierarchical Clustering Methods

Agglomerative Hierarchical Clustering Algorithm (Grouping
N

Objects):

N

clust
ers, each containing a single entity and an
N
N

symmetric matrix of distances (or similarities)

ik
d
D

.

2

2

2. Search the distance matrix for the nearest (most similar) pair of
clusters. L
et the distance between “most simil
ar” clusters
U

and
V

be
UV
d
.

3. Merge clusters
U

and
V
. Label the newly formed cluster
(UV)
.
Update the
entries in the distance matrix

(a) deleting the rows and columns corresponding to clusters
U

and
V

and

(b) adding a row and col
umn giving the distances between
cluster
(UV)

and the remaining clusters.

4. Repeat Steps 2 and 3 a total of
N
-
1

times. (All objects will be in a

single cluster after the algorithm terminates.) Record the
identity of clusters that are merged and the levels

(distances) at
which the merges take place.

There are 3 linkage methods. The main differences among these
methods are the distances between
(UV)

and any other cluster
W
.

VW
UW
W
UV
d
d
d
,
min

.

(II) Complete

VW
UW
W
UV
d
d
d
,
max

.

(III) Average

W
UV
i
k
ik
W
UV
N
N
d
d

,

where
ik
d

is the distance between object
i

in the cluster
(UV)

and
object
k

in the cluster
W
, and

UV
N

and
W
N

are the number of items
in
clusters
(UV)

and
W
, respectively.