Color clustering
Sources of the
R
G
B
data vectors:
Red
-
Green
plot of the vectors:
G
R
Example of clustering
Clustering for vector quantization
Starting point (996 data vectors)
Clustering result (256 clusters)
Goals of clustering
and classification
1. Supervised classification:
Partition the input set so that data vectors that originate from
the same source belong to the same group.
-
Training data available with known classification.
-
Typical solutions:
o
statistical methods.
o
neura
l networks
2. Clustering:
Partition the input set so that similar vectors are grouped
together and dissimilar vectors to different groups. No
training available
classes are unknown, model is fitted to
data.
-
Goals to solve:
o
Find
how many clusters
o
Find
the
location
of clusters
-
Typical solutions:
o
clustering algorithms
o
other statistical methods
3. Vector quantization:
Generate codebook that approximates the input data.
-
Number of clustrers defined by user
-
Codebook generated by clustering algorithms
Vect
or quantization
Data:
X
Set of
N
input vectors
X
={
x
1
,
x
2
,…,
x
N
}
P
Partition of
M
clusters
P
={
p
1
,
p
2
,…,
p
M
}
C
Cluster centroids
C
={
c
1
,
c
2
,…,
c
M
}
Goal:
Find such
C
and
P
to minimize
f
(
C
,
P
)
Error function:
N
i
p
i
i
c
x
N
C
P
f
1
2
1
)
,
(
Codebook
C
Training set
X
Code
vectors
Mapping
function
P
K
-dimesional vector
Scalar in [1..
M
]
1
42
3
3
1
2
3
4
N
1
2
3
42
M
N
training
vectors
8
32
11
M
code
vectors
Representation of solution
Partition
Codebook
Main approaches
1.
Hierarchical methods
-
Build the clustering structure stepwise:
-
Splitting approach (top
-
down):
o
Increase clusters by adding new ones
o
For example: divide the largest cluster
-
Me
rge
-
based approach (bottom
-
up):
o
Decrease clusters by removing existing ones
o
For example: merge existing clusters
2.
Iterative methods
-
Take any initial solution, e.g. random clustering
-
Make small changes to the existing solution by:
o
Descendent method (apply
rules that improve)
o
Local search (trial
-
and
-
error approach)
Generalized Lloyd algorithm (GLA)
Partition step:
p
d
x
c
i
N
i
j
M
i
j
arg
min
,
,
1
2
1
Centroid step:
c
x
j
M
j
i
p
j
p
j
i
i
1
1
,
GLA
(
X,P
,
C
): returns (
P
,
C
)
REPEAT
FOR i:=1 TO
N
DO
P
i
FindNearestCe
ntroid(
x
i
,
C
)
FOR i:=1 TO
M
DO
C
i
CalculateCentroid(
X
,
P
,
i
)
UNTIL no improvement.
Splitting approach
Split
Put all vectors in one clusters;
REPEAT
Select cluster to be split;
Split the cluster;
UNTIL final cluster size reached;
Median cut algori
thm
(example)
(
0
,
0
)
(
0
,
5
)
(
0
,
1
5
)
(
1
,
1
0
)
(
4
,
4
)
(
4
,
1
2
)
(
5
,
4
)
(
6
,
6
)
(
1
5
,
0
)
(
1
5
,
1
4
)
x
,
y
(
)
D
i
s
t
r
i
b
u
t
i
o
n
o
f
t
h
e
c
o
l
o
r
s
:
0
5
1
5
1
0
0
5
1
0
1
5
x
y
C
o
l
o
r
s
s
a
m
p
l
e
s
:
F
i
n
a
l
c
o
l
o
r
p
a
l
e
t
t
e
:
S
t
a
g
e
:
R
e
g
i
o
n
s
:
M
a
x
i
m
u
m
d
i
m
e
n
s
i
o
n
:
S
t
a
g
e
:
R
e
g
i
o
n
s
:
M
a
x
i
m
u
m
d
i
m
e
n
s
i
o
n
:
A
=
[
0
.
.
1
5
,
0
.
.
1
5
]
A
=
[
0
.
.
4
,
0
.
.
1
5
]
B
=
[
5
.
.
1
5
,
0
.
.
1
4
]
A
=
[
0
.
.
4
,
0
.
.
5
]
B
=
[
5
.
.
1
5
,
0
.
.
1
4
]
C
=
[
0
.
.
4
,
1
0
.
.
1
5
]
I
n
i
t
i
a
l
:
1
.
2
.
0
.
.
1
5
0
.
.
1
5
0
.
.
1
4
0
.
.
5
0
.
.
1
4
1
0
.
.
1
5
A
=
[
0
.
.
4
,
0
.
.
5
]
B
=
[
5
.
.
1
5
,
0
.
.
4
]
C
=
[
0
.
.
4
,
1
0
.
.
1
5
]
D
=
[
6
.
.
1
5
,
6
.
.
1
4
]
A
=
[
0
.
.
4
,
0
.
.
5
]
B
=
[
5
.
.
5
,
4
.
.
4
]
C
=
[
0
.
.
4
,
1
0
.
.
1
5
]
D
=
[
6
.
.
1
5
,
6
.
.
1
4
]
E
=
[
1
5
.
.
1
5
,
0
.
.
0
]
3
.
4
.
0
.
.
5
5
.
.
1
5
1
0
.
.
1
5
6
.
.
1
5
0
.
.
5
5
.
.
5
1
0
.
.
1
5
6
.
.
1
5
1
5
.
.
1
5
(
1
,
3
)
(
6
,
4
)
(
2
,
1
2
)
(11,10)
(
1
5
,
0
)
0
5
1
5
1
0
0
5
1
0
1
5
x
y
1
.
2
.
3
.
4
.
Median cut + GLA
(example)
D
i
s
t
r
i
b
u
t
i
o
n
o
f
t
h
e
c
o
l
o
r
s
:
M
e
d
i
a
n
c
u
t
s
e
g
m
e
n
t
a
t
i
o
n
:
2
5
0
2
2
4
8
0
e
r
r
o
r
:
S
q
u
a
r
e
I
n
t
o
t
a
l
:
9
5
(
1
,
3
)
(
5
,
4
)
(
2
,
1
2
)
(
1
1
,
1
0
)
(
1
5
,
0
)
R
e
g
i
o
n
s
:
C
o
l
o
r
:
A
:
(
0
,
0
)
(
0
,
5
)
(
4
,
4
)
B
:
(
5
,
4
)
C
:
(
0
,
1
5
)
(
1
,
1
0
)
(
4
,
1
2
)
D
:
(
6
,
6
)
(
1
5
,
1
4
)
E
:
(
1
5
,
0
)
0
5
1
5
1
0
0
5
1
0
1
5
x
y
(
1
,
3
)
1
3
5
2
2
0
0
e
r
r
o
r
:
S
q
u
a
r
e
I
n
t
o
t
a
l
:
4
0
(
0
,
0
)
(
0
,
5
)
(
4
,
4
)
(
5
,
4
)
(
6
,
6
)
(
0
,
1
5
)
(
1
,
1
0
)
(
4
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,
0
)
c
o
l
o
r
:
O
r
i
g
i
n
a
l
t
h
e
r
e
p
r
e
s
e
n
t
a
t
i
v
e
:
C
o
l
o
r
s
m
a
p
p
e
d
t
o
c
o
l
o
r
:
N
e
w
(
5
,
4
)
(
2
,
1
2
)
(
1
1
,
1
0
)
(
1
5
,
0
)
(
0
,
3
)
(
5
,
5
)
(
2
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,
0
)
0
5
1
5
1
0
0
5
1
0
1
5
x
y
A
f
t
e
r
f
i
r
s
t
i
t
e
r
a
t
i
o
n
:
0
5
1
5
1
0
0
5
1
0
1
5
x
y
A
f
t
e
r
s
e
c
o
n
d
i
t
e
r
a
t
i
o
n
:
0
5
1
5
1
0
0
5
1
0
1
5
x
y
(
0
,
0
)
(
0
,
5
)
(
0
,
1
5
)
(
1
,
1
0
)
(
4
,
4
)
(
4
,
1
2
)
(
5
,
4
)
(
6
,
6
)
(
1
5
,
0
)
(
1
5
,
1
4
)
x
,
y
(
)
C
o
l
o
r
s
s
a
m
p
l
e
s
:
(
0
,
3
)
1
3
5
2
2
0
0
e
r
r
o
r
:
S
q
u
a
r
e
I
n
t
o
t
a
l
:
4
0
(
0
,
0
)
(
0
,
5
)
(
4
,
4
)
(
5
,
4
)
(
6
,
6
)
(
0
,
1
5
)
(
1
,
1
0
)
(
4
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,
0
)
c
o
l
o
r
:
O
r
i
g
i
n
a
l
t
h
e
r
e
p
r
e
s
e
n
t
a
t
i
v
e
:
C
o
l
o
r
s
m
a
p
p
e
d
t
o
c
o
l
o
r
:
N
e
w
(
5
,
5
)
(
2
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,
0
)
(
0
,
3
)
(
5
,
5
)
(
2
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,
0
)
PCA
-
based splitting
1. Calculate the
principal axis
.
2. Select the dividing point
P
at the principal axis.
3. Partition accord
ing to
hyperplane
.
4. Calculate two centroids of the two subclusters.
dividing point
principal axis
dividing hyperplane
899
1617
principal axis
298
678
899
principal axis
111
429
298
678
principal axis
113
63
298
429
111
Time complexity of splitting
-
Assume clusters of
n
vectors with
K
values (
K
=3 for RGB);
-
Principal axis calculated in
O
(
nK
2
)
time
-
Selection of dividing point in
O
(
n
log
n
) time
-
Assume that
largest
cluster is split to equal halves:
n
n
/2
-
Total number of vectors is:
2
...
2
...
4
4
4
4
2
2
M
N
M
N
N
N
N
N
N
N
N
n
i
M
N
M
N
M
N
N
N
log
2
2
...
4
4
2
2
Total time complexity
is
O
(
NK
2
log
M
) +
O
(
N
log
N
)
Splitting experiments
With partition refinement:
265
168
74
261
130
Quality
-
time comparison:
2
0
4
0
6
0
1
6
5
1
7
0
1
8
0
9
4
0
2
6
0
M
S
E
T
i
m
e
(
s
e
c
o
n
d
s
)
R
a
n
d
o
m
S
p
l
i
t
-
1
S
p
l
i
t
-
2
S
+
G
L
A
S
L
R
S
G
L
A
S
L
R
+
G
L
A
R
+
G
L
A
1
7
5
1
0
3
0
5
0
N
e
w
m
e
t
h
o
d
E
x
i
s
t
i
n
g
m
e
t
h
o
d
Merge
-
based approach: PNN algorithm
PNN
Put all vectors in own cluster;
REPEA
T
(
a
,
b
)
SearchClusterPair;
MergeClusters
(a
,
b
);
UNTIL final cluster size reached;
Code vectors:
Training vectors:
Before cluster merge
After cluster merge
Vectors to be merged
Remaining vectors
Training vectors of the clusters to be merged
Other training vectors
S
2
S
3
S
4
S
5
S
1
x
+
x
x
x
x
x
x
x
x
x
x
x
x
x
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
x
x
x
x
x
x
x
x
x
x
x
x
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
x
+
Iterative shrinking
IS
(
X
,
M
)
S
FOR
i
1 to
N
DO
s
i
{
x
i
};
REPEAT
s
a
SearchClusterToBeRemoved(
S
);
Repartitio
nCluster(
S
, s
a
);
UNTIL |S|=
M
;
Code vectors:
Training vectors:
Before cluster removal
After cluster removal
Vector to be removed
Remaining vectors
Training vectors of the cluster to be removed
Other training vectors
S
2
S
3
S
4
S
5
S
1
x
+
+
+
+
+
+
+
+
+
x
x
x
x
x
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
x
x
x
x
x
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Results using merge
-
approaches
Results using merge
-
approach (PNN)
PNN
After
third
merge
After
fourth
merge
IS
160
162
164
166
168
170
172
174
176
178
180
0
10
20
30
40
50
60
70
80
90
Run time
MSE
PNN (original)
GLA-PNN-GLA (improved)
GA-PNN (improved)
PNN (improved)
Split and merge
Generate an initial codebook by any algorithm.
Repe
at
Select a cluster to be split.
Split the selected cluster.
Select two clusters to be merged
Merge the selected clusters
Until
no improvement achieved.
Split-Merge
Merge-Split
0
M
N
M-h
M+h
Comparison of Split and Merge
180
175
170
160
165
0
100
200
300
600
700
GLA
Split
SLR
PNN
SGLA
SM
SMG
Time (seconds)
MSE
Structure
of
Local Search
Generate initial solution.
REPEAT
Generate a set of new solutions.
Evaluate the new solutions.
Select the best solution.
UNTIL stopping criterion met.
Neighborhood function using random swap:
c
x
j
random
M
i
random
N
j
i
(
,
),
(
,
)
1
1
Object rejec
tion:
p
d
x
c
i
p
j
i
k
M
i
k
i
arg
min
,
1
2
Object attraction
:
p
d
x
c
i
N
i
k
j
k
p
i
k
i
arg
min
,
,
2
1
Randomized local search
RLS
algorithm 1:
C
SelectRandomDataObjects(
M
).
P
OptimalPartition(
C
).
REPEAT
T
times
C
new
RandomSwap(
C
).
P
new
LocalRepartition(
P
,
C
new
).
C
new
OptimalRepresentatives(
P
new
).
IF
f
(
P
new
, C
new
) <
f
(
P, C
) THEN
(
P, C
)
(
P
new
, C
new
)
RLS
algorithm 2:
C
SelectRandomDataObjects(
M
).
P
OptimalPartition(
C
).
REPEAT
T
times
C
new
RandomSwap(
C
).
P
new
LocalRepartition(
P
,
C
new
).
K
-
means(
P
new
,C
new
).
IF
f
(
P
new
, C
new
) <
f
(
P, C
) THEN
(
P, C
)
(
P
new
, C
new
)
Random swap
BEFORE SWAP
Missing clusters
unnecessary clusters
AFTER SWAP
Centroid
removed
Centroid
added
Local fine
-
tuning
LOCAL REFINEMENT
Obsolete cluster disappears
New cluster appears
AFTER K
-
MEANS
Cluster moves down
Genetic
algorithm
176.53
163.93
163.63
163.51
163.08
150
155
160
165
170
175
180
185
190
K-means
Random
+ RLS
K-means
+ RLS
Splitting
+ RLS
Ward +
RLS
MSE
RLS-2
Bridge
160
165
170
175
180
185
190
0
1000
2000
3000
4000
5000
Iterations
MSE
RLS-1
RLS-2
Structure of
Genetic Algorithm
Genetic algorithm:
Generate
S
initial solutions.
REPEAT
T
times
Generate new solutions.
Sort the solutions.
Store the best solution.
END
-
REPEAT
Output t
he best solution found.
Generate new solutions:
REPEAT
S
times
Select pair for crossover.
Cross the selected solutions.
Mutate the new solution.
Fine
-
tune the new solution by GLA.
END
-
REPEAT
Pseudo code for the GA
(1/2)
CrossSolutions(
C
1
,
P
1
,
C
2
,
P
2
)
(
C
new
,
P
new
)
C
new
CombineCentroids(
C
1
,
C
2
)
P
new
CombinePartitions(
P
1
,
P
2
)
C
new
UpdateCentroids(
C
new
,
P
new
)
RemoveEmptyClusters(
C
new
,
P
new
)
PerformPNN(
C
new
,
P
new
)
CombineCentroids(
C
1
,
C
2
)
C
new
C
new
C
1
C
2
CombinePartitions(
C
new
,
P
1
,
P
2
)
P
new
FOR
i
1 TO
N
DO
IF
x
c
x
c
i
p
i
p
i
i
1
2
2
2
THEN
p
p
i
new
i
1
ELSE
p
p
i
new
i
2
END
-
FOR
Pseudo code for the GA
(2/2)
UpdateCentroids(
C
1
,
C
2
)
C
new
FOR
j
1 TO |
C
new
| DO
c
j
new
CalculateCentroid(
P
n
ew
,
j
)
PerformPNN(
C
new
,
P
new
)
FOR
i
1 TO |
C
new
| DO
q
i
FindNearestNeighbor(
c
i
)
WHILE |
C
new
|>
M
DO
a
FindMinimumDistance(
Q
)
b
q
a
MergeClusters(
c
a
,
p
a
,
c
b
,
p
b
)
UpdatePointers(
Q
)
END
-
WHILE
Combining existing solutions
Performance comp
arison of GA
160
165
170
175
180
0
10
20
30
40
50
Number of iterations
Distortion
Bridge
Mutations + GLA
PNN crossover + GLA
Random crossover + GLA
PNN crossover
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο