1
Pattern Recognition:
Statistical and Neural
Lonnie C. Ludeman
Lecture 27
Nov 9, 2005
Nanjing University of Science & Technology
2
Lecture 27 Topics
1.
K

Means Clustering Algorithm Details
2.
K

Means Step by Step Example
3.
ISODATA Algorithm

Overview
4. Agglomerative Hierarchical Clustering
Algorithm Description
3
K

Means Clustering Algorithm:
Basic Procedure
Randomly Select
K cluster centers from
Pattern Space
Distribute
set of patterns to the cluster
center using minimum distance
Compute
new Cluster centers for each
cluster
Continue
this process until the cluster
centers do not change.
4
Flow Diagram for K

Means Algorithm
5
Step 1 Initialization
Choose K initial Cluster centers
M
1
(1), M
2
(1), ... , M
K
(1)
Method 1
–
First K samples
Method 2
–
K data samples selected randomly
Method 3
–
K random vectors
Set m = 1 and
Go To Step 2
6
Step 2 Determine New Clusters
Using Cluster centers
Distribute pattern
vectors using minimum distance
.
Method 1
–
Use Euclidean distance
Method 2
–
Use other distance measures
Assign sample
x
j
to class C
k
if
Go to Step 3
7
Step 3 Compute New Cluster Centers
Using the new Cluster assignment
Cl
k
(m) m = 1, 2, ... , K
Compute new cluster centers
M
k
(m+1) m = 1, 2, ... , K
using
Go to Step 4
where
N
k
,
k = 1, 2, ... , K
is the number of pattern vectors in Cl
k
(m)
8
Step 4 Check for Convergence
Using Cluster centers from step 3 check
for convergence
Convergence occurs if the means
do not change
If Convergence occurs Clustering
is complete and the results given.
If
No Convergence
then
Go to Step 5
9
Step 5 Check for Maximum Number of
Iterations
Define
MAXIT
as the maximum number of
iterations that is acceptable.
If m = MAXIT
Then display no convergence
and
Stop
.
If m < MAXIT
Then m=m+1 (increment m)
and
Return to Step 2
10
Example:
K

Means cluster algorithm
Given the following set of pattern vectors
11
Plot of Data points in Given set of samples
12
Do the following
13
Plot of Data points in Given set of samples
Initial
Cluster
centers
(a) Solution
–
2

class case
14
Initial Cluster Centers
Distances from all Samples to cluster centers
First Cluster assignment
Cl
1
Cl
2
Cl
1
Cl
2
Cl
2
Cl
2
Cl
2
With tie select randomly
15
Plot of Data points in Given set of samples
Closest to
x
1
Closest to
x
2
16
Compute New Cluster centers
First Cluster Assignment
17
Plot of Data points in Given set of samples
New
Cluster
centers
18
Distances from all Samples to cluster centers
2
2
Cl
1
Cl
1
Cl
1
Cl
2
Cl
2
Cl
2
Cl
2
Second Cluster assignment
19
Plot of Data points in Given set of samples
New
Clusters
Old Cluster Center
M
1
(2)
M
2
(2)
Old Cluster Center
20
Compute New Cluster Centers
21
Plot of Data points in Given set of samples
New
Clusters
Cluster
Centers
M
2
(3)
M
1
(3)
22
Distances from all Samples to cluster centers
Compute New Cluster centers
Cl
2
Cl
2
Cl
2
Cl
1
Cl
1
Cl
2
Cl
1
3
3
23
(b) Solution: 3

Class case
Select Initial Cluster Centers
First Cluster assignment using distances
from pattern vectors to initial cluster centers
24
Compute New Cluster centers
Second Cluster assignment using distances
from pattern vectors to cluster centers
25
At the next step we have convergence as
the cluster centers do not change thus the
Final Cluster Assignment becomes
26
Plot of Data points in Given set of samples
Final Cluster
Centers
Final 3

Class Clusters
Cl
1
Cl
3
Cl
2
27
I
terative
S
elf
O
rganizing
D
ata
A
nalysis
T
echnique
A
ISODATA Algorithm
Performs Clustering of unclassified
quantitative data with an unknown
number of clusters
Similar to K

Means but with ablity to
merge and split clusters thus giving
flexibility in number of clusters
28
ISODATA
Parameters
that need to be
specified
Requires more specified information than
for the K

Means Algorithm
merged at each step
29
ISODATA Algorithm
Final Clustering
30
Hierarchical Clustering
Approach 1
Agglomerative
Combines groups at each level
Approach 2
Devisive
Combines groups at each level
Will present only Agglomerative Hierarchical
Clustering as it is most used.
31
Agglomerative Hierarchical Clustering
S
= {
x
1
,
x
2,
... ,
x
k
, ... ,
x
N
}
Consider a set
S
of patterns to be clustered
Define Level N by
S
1
(N)
= {
x
1
}
S
N
(N)
= {
x
N
}
S
2
(N)
= {
x
2
}
Clusters at
level N are the
individual
pattern vectors
...
32
Define Level N

1
to be N
–
1 Clusters
formed by merging two of the Level N
clusters by the following process.
Compute the distances between all the
clusters at level N
and
merge the two with
the smallest distance
(resolve ties
randomly) to give the Level N

1 clusters as
S
1
(N

1)
S
N

1
(N

1)
S
2
(N

1)
Clusters at
level N

1 result
from this
merging
...
33
The
process of merging two clusters at
each step is performed sequentially
until
Level 1 is reached.
Level one is a single
cluster
containing all samples
S
1
(1)
= {
x
1
,
x
2,
... ,
x
k
, ... ,
x
N
}
Thus Hierarchical clustering
provides
cluster assignments for all numbers of
clusters from N to 1.
34
Definition:
A
Dendrogram
is a tree like structure that
illustrates the mergings of clusters at each
step of the Hierarchical Approach.
A
typical dendrogram
appears on
the next slide
35
Typical Dendrogram
36
Summary Lecture 27
1.
Presented the K

Means Clustering
Algorithm Details
2.
Showed Example of Clustering using
the K

Means Algorithm (Step by Step)
3.
Briefly discussed the ISODATA
Algorithm
4. Introduced the Agglomerative
Hierarchical Clustering Algorithm
37
End of Lecture 27
Comments 0
Log in to post a comment