Lecture27 - Zianet.com

mudlickfarctateΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

116 εμφανίσεις

1

Pattern Recognition:

Statistical and Neural

Lonnie C. Ludeman


Lecture 27


Nov 9, 2005

Nanjing University of Science & Technology

2

Lecture 27 Topics

1.
K
-
Means Clustering Algorithm Details

2.
K
-
Means Step by Step Example

3.

ISODATA Algorithm
-
Overview

4. Agglomerative Hierarchical Clustering
Algorithm Description


3

K
-
Means Clustering Algorithm:

Basic Procedure

Randomly Select

K cluster centers from
Pattern Space

Distribute

set of patterns to the cluster
center using minimum distance

Compute

new Cluster centers for each
cluster

Continue
this process until the cluster
centers do not change.

4

Flow Diagram for K
-
Means Algorithm

5

Step 1 Initialization


Choose K initial Cluster centers


M
1
(1), M
2
(1), ... , M
K
(1)

Method 1


First K samples
Method 2


K data samples selected randomly
Method 3


K random vectors

Set m = 1 and
Go To Step 2

6

Step 2 Determine New Clusters



Using Cluster centers
Distribute pattern
vectors using minimum distance
.

Method 1


Use Euclidean distance
Method 2


Use other distance measures

Assign sample
x
j

to class C
k
if

Go to Step 3

7

Step 3 Compute New Cluster Centers



Using the new Cluster assignment


Cl
k
(m) m = 1, 2, ... , K



Compute new cluster centers


M
k
(m+1) m = 1, 2, ... , K



using

Go to Step 4


where
N
k

,

k = 1, 2, ... , K


is the number of pattern vectors in Cl
k
(m)

8

Step 4 Check for Convergence


Using Cluster centers from step 3 check
for convergence

Convergence occurs if the means
do not change

If Convergence occurs Clustering
is complete and the results given.

If
No Convergence

then
Go to Step 5

9

Step 5 Check for Maximum Number of
Iterations



Define
MAXIT

as the maximum number of
iterations that is acceptable.


If m = MAXIT

Then display no convergence


and
Stop
.



If m < MAXIT

Then m=m+1 (increment m)


and
Return to Step 2


10

Example:

K
-
Means cluster algorithm

Given the following set of pattern vectors

11

Plot of Data points in Given set of samples

12

Do the following

13

Plot of Data points in Given set of samples

Initial
Cluster
centers

(a) Solution


2
-
class case

14

Initial Cluster Centers

Distances from all Samples to cluster centers

First Cluster assignment

Cl
1

Cl
2

Cl
1

Cl
2

Cl
2

Cl
2

Cl
2

With tie select randomly

15

Plot of Data points in Given set of samples

Closest to
x
1

Closest to
x
2

16

Compute New Cluster centers

First Cluster Assignment

17

Plot of Data points in Given set of samples

New
Cluster
centers

18

Distances from all Samples to cluster centers

2

2

Cl
1

Cl
1

Cl
1

Cl
2

Cl
2

Cl
2

Cl
2

Second Cluster assignment

19

Plot of Data points in Given set of samples

New
Clusters

Old Cluster Center

M
1
(2)

M
2
(2)

Old Cluster Center

20

Compute New Cluster Centers

21

Plot of Data points in Given set of samples

New
Clusters

Cluster
Centers

M
2
(3)

M
1
(3)

22

Distances from all Samples to cluster centers

Compute New Cluster centers

Cl
2

Cl
2

Cl
2

Cl
1

Cl
1

Cl
2

Cl
1

3

3

23

(b) Solution: 3
-
Class case

Select Initial Cluster Centers

First Cluster assignment using distances
from pattern vectors to initial cluster centers

24

Compute New Cluster centers

Second Cluster assignment using distances
from pattern vectors to cluster centers

25

At the next step we have convergence as
the cluster centers do not change thus the
Final Cluster Assignment becomes

26

Plot of Data points in Given set of samples

Final Cluster
Centers

Final 3
-
Class Clusters

Cl
1

Cl
3

Cl
2

27

I
terative
S
elf
O
rganizing
D
ata
A
nalysis
T
echnique
A

ISODATA Algorithm

Performs Clustering of unclassified
quantitative data with an unknown
number of clusters

Similar to K
-
Means but with ablity to
merge and split clusters thus giving
flexibility in number of clusters

28

ISODATA
Parameters

that need to be
specified

Requires more specified information than
for the K
-
Means Algorithm

merged at each step

29

ISODATA Algorithm

Final Clustering

30

Hierarchical Clustering



Approach 1
Agglomerative


Combines groups at each level


Approach 2
Devisive


Combines groups at each level

Will present only Agglomerative Hierarchical
Clustering as it is most used.

31

Agglomerative Hierarchical Clustering

S
= {
x
1
,
x
2,

... ,
x
k
, ... ,
x
N
}

Consider a set
S

of patterns to be clustered

Define Level N by

S
1
(N)

= {
x
1
}

S
N
(N)

= {
x
N
}

S
2
(N)

= {
x
2
}

Clusters at
level N are the
individual
pattern vectors

...

32

Define Level N
-
1

to be N


1 Clusters
formed by merging two of the Level N
clusters by the following process.

Compute the distances between all the
clusters at level N

and
merge the two with
the smallest distance
(resolve ties
randomly) to give the Level N
-
1 clusters as

S
1
(N
-
1)

S
N
-
1
(N
-
1)

S
2
(N
-
1)

Clusters at
level N
-
1 result
from this
merging

...

33

The
process of merging two clusters at
each step is performed sequentially

until
Level 1 is reached.
Level one is a single
cluster
containing all samples

S
1
(1)

= {
x
1
,
x
2,

... ,
x
k
, ... ,
x
N
}

Thus Hierarchical clustering
provides
cluster assignments for all numbers of
clusters from N to 1.

34

Definition:

A
Dendrogram

is a tree like structure that
illustrates the mergings of clusters at each
step of the Hierarchical Approach.

A
typical dendrogram

appears on
the next slide

35

Typical Dendrogram

36

Summary Lecture 27

1.
Presented the K
-
Means Clustering
Algorithm Details

2.
Showed Example of Clustering using
the K
-
Means Algorithm (Step by Step)

3.
Briefly discussed the ISODATA
Algorithm

4. Introduced the Agglomerative
Hierarchical Clustering Algorithm

37

End of Lecture 27