# slides7

Τεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 4 χρόνια και 5 μήνες)

141 εμφανίσεις

Color clustering

Sources of the
R
G
B

data vectors:

Red
-
Green

plot of the vectors:

G

R

Example of clustering

Clustering for vector quantization

Starting point (996 data vectors)

Clustering result (256 clusters)

Goals of clustering

and classification

1. Supervised classification:

Partition the input set so that data vectors that originate from
the same source belong to the same group.

-

Training data available with known classification.

-

Typical solutions:

o

statistical methods.

o

neura
l networks

2. Clustering:

Partition the input set so that similar vectors are grouped
together and dissimilar vectors to different groups. No
training available

classes are unknown, model is fitted to
data.

-

Goals to solve:

o

Find
how many clusters

o

Find
the
location

of clusters

-

Typical solutions:

o

clustering algorithms

o

other statistical methods

3. Vector quantization:

Generate codebook that approximates the input data.

-

Number of clustrers defined by user

-

Codebook generated by clustering algorithms

Vect
or quantization

Data:

X

Set of
N

input vectors
X
={
x
1
,
x
2
,…,
x
N
}

P

Partition of
M

clusters
P
={
p
1
,

p
2
,…,
p
M
}

C

Cluster centroids
C
={
c
1
,

c
2
,…,
c
M
}

Goal:

Find such
C

and
P

to minimize
f
(
C
,

P
)

Error function:

N
i
p
i
i
c
x
N
C
P
f
1
2
1
)
,
(

Codebook
C
Training set
X
Code
vectors
Mapping
function
P
K
-dimesional vector
Scalar in [1..
M
]
1
42
3
3
1
2
3
4
N
1
2
3
42
M
N
training
vectors
8
32
11
M
code
vectors

Representation of solution

Partition

Codebook

Main approaches

1.

Hierarchical methods

-

Build the clustering structure stepwise:

-

Splitting approach (top
-
down):

o

Increase clusters by adding new ones

o

For example: divide the largest cluster

-

Me
rge
-
based approach (bottom
-
up):

o

Decrease clusters by removing existing ones

o

For example: merge existing clusters

2.

Iterative methods

-

Take any initial solution, e.g. random clustering

-

Make small changes to the existing solution by:

o

Descendent method (apply

rules that improve)

o

Local search (trial
-
and
-
error approach)

Generalized Lloyd algorithm (GLA)

Partition step:

p
d
x
c
i
N
i
j
M
i
j

arg
min
,
,
1
2
1

Centroid step:

c
x
j
M
j
i
p
j
p
j
i
i

1
1
,

GLA

(
X,P
,
C
): returns (
P
,
C
)

REPEAT

FOR i:=1 TO
N

DO

P
i

FindNearestCe
ntroid(
x
i
,
C
)

FOR i:=1 TO
M

DO

C
i

CalculateCentroid(
X
,
P
,
i
)

UNTIL no improvement.

Splitting approach

Split

Put all vectors in one clusters;

REPEAT

Select cluster to be split;

Split the cluster;

UNTIL final cluster size reached;

Median cut algori
thm

(example)

(

0
,

0
)
(

0
,

5
)
(

0
,
1
5
)
(

1
,
1
0
)
(

4
,

4
)
(

4
,
1
2
)
(

5
,

4
)
(

6
,

6
)
(
1
5
,

0
)
(
1
5
,
1
4
)

x
,

y
(

)
D
i
s
t
r
i
b
u
t
i
o
n

o
f

t
h
e

c
o
l
o
r
s
:
0
5
1
5
1
0
0
5
1
0
1
5
x
y
C
o
l
o
r
s
s
a
m
p
l
e
s
:
F
i
n
a
l

c
o
l
o
r

p
a
l
e
t
t
e
:
S
t
a
g
e
:
R
e
g
i
o
n
s
:
M
a
x
i
m
u
m
d
i
m
e
n
s
i
o
n
:
S
t
a
g
e
:
R
e
g
i
o
n
s
:
M
a
x
i
m
u
m
d
i
m
e
n
s
i
o
n
:
A
=
[
0
.
.
1
5
,

0
.
.
1
5
]
A
=
[
0
.
.
4
,

0
.
.
1
5
]
B
=
[
5
.
.
1
5
,

0
.
.
1
4
]
A
=
[
0
.
.
4
,

0
.
.
5
]
B
=
[
5
.
.
1
5
,

0
.
.
1
4
]
C
=
[
0
.
.
4
,

1
0
.
.
1
5
]
I
n
i
t
i
a
l
:
1
.
2
.
0
.
.
1
5
0
.
.
1
5
0
.
.
1
4
0
.
.
5
0
.
.
1
4
1
0
.
.
1
5
A
=
[
0
.
.
4
,

0
.
.
5
]
B
=
[
5
.
.
1
5
,

0
.
.
4
]
C
=
[
0
.
.
4
,

1
0
.
.
1
5
]
D
=
[
6
.
.
1
5
,

6
.
.
1
4
]
A
=
[
0
.
.
4
,

0
.
.
5
]
B
=
[
5
.
.
5
,

4
.
.
4
]
C
=
[
0
.
.
4
,

1
0
.
.
1
5
]
D
=
[
6
.
.
1
5
,

6
.
.
1
4
]
E
=
[
1
5
.
.
1
5
,

0
.
.
0
]
3
.
4
.
0
.
.
5
5
.
.
1
5
1
0
.
.
1
5
6
.
.
1
5
0
.
.
5
5
.
.
5
1
0
.
.
1
5
6
.
.
1
5
1
5
.
.
1
5
(

1
,

3
)
(

6
,

4
)
(

2
,
1
2
)
(11,10)
(
1
5
,

0
)
0
5
1
5
1
0
0
5
1
0
1
5
x
y
1
.
2
.
3
.
4
.

Median cut + GLA

(example)

D
i
s
t
r
i
b
u
t
i
o
n

o
f

t
h
e

c
o
l
o
r
s
:
M
e
d
i
a
n

c
u
t

s
e
g
m
e
n
t
a
t
i
o
n
:
2
5

0
2
2
4
8

0
e
r
r
o
r
:
S
q
u
a
r
e
I
n

t
o
t
a
l
:

9
5
(

1
,

3
)
(

5
,

4
)
(

2
,
1
2
)
(
1
1
,
1
0
)
(
1
5
,

0
)
R
e
g
i
o
n
s
:
C
o
l
o
r
:
A
:

(

0
,

0
)

(

0
,

5
)

(

4
,

4
)
B
:

(

5
,

4
)
C
:

(

0
,
1
5
)

(
1
,
1
0
)

(

4
,
1
2
)
D
:

(

6
,

6
)

(
1
5
,
1
4
)
E
:

(
1
5
,

0
)
0
5
1
5
1
0
0
5
1
0
1
5
x
y
(

1
,

3
)
1
3

5
2
2

0

0
e
r
r
o
r
:
S
q
u
a
r
e
I
n

t
o
t
a
l
:

4
0
(

0
,

0
)

(

0
,

5
)
(

4
,

4
)

(

5
,

4
)

(

6
,

6
)
(

0
,
1
5
)

(

1
,
1
0
)

(

4
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,

0
)
c
o
l
o
r
:
O
r
i
g
i
n
a
l
t
h
e

r
e
p
r
e
s
e
n
t
a
t
i
v
e
:
C
o
l
o
r
s

m
a
p
p
e
d

t
o
c
o
l
o
r
:

N
e
w
(

5
,

4
)
(

2
,
1
2
)
(
1
1
,
1
0
)
(
1
5
,

0
)
(

0
,

3
)
(

5
,

5
)
(

2
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,

0
)
0
5
1
5
1
0
0
5
1
0
1
5
x
y
A
f
t
e
r

f
i
r
s
t

i
t
e
r
a
t
i
o
n
:
0
5
1
5
1
0
0
5
1
0
1
5
x
y
A
f
t
e
r

s
e
c
o
n
d

i
t
e
r
a
t
i
o
n
:
0
5
1
5
1
0
0
5
1
0
1
5
x
y
(

0
,

0
)
(

0
,

5
)
(

0
,
1
5
)
(

1
,
1
0
)
(

4
,

4
)
(

4
,
1
2
)
(

5
,

4
)
(

6
,

6
)
(
1
5
,

0
)
(
1
5
,
1
4
)

x
,

y
(

)
C
o
l
o
r
s
s
a
m
p
l
e
s
:
(

0
,

3
)
1
3

5
2
2

0

0
e
r
r
o
r
:
S
q
u
a
r
e
I
n

t
o
t
a
l
:

4
0
(

0
,

0
)

(

0
,

5
)
(

4
,

4
)

(

5
,

4
)

(

6
,

6
)
(

0
,
1
5
)

(

1
,
1
0
)

(

4
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,

0
)
c
o
l
o
r
:
O
r
i
g
i
n
a
l
t
h
e

r
e
p
r
e
s
e
n
t
a
t
i
v
e
:
C
o
l
o
r
s

m
a
p
p
e
d

t
o
c
o
l
o
r
:

N
e
w
(

5
,

5
)
(

2
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,

0
)
(

0
,

3
)
(

5
,

5
)
(

2
,
1
2
)
(
1
5
,
1
4
)
(
1
5
,

0
)

PCA
-
based splitting

1. Calculate the
principal axis
.

2. Select the dividing point
P

at the principal axis.

3. Partition accord
ing to
hyperplane
.

4. Calculate two centroids of the two subclusters.

dividing point
principal axis
dividing hyperplane
899
1617

principal axis
298
678
899

principal axis
111
429
298
678

principal axis
113
63
298
429
111

Time complexity of splitting

-

Assume clusters of
n

vectors with
K

values (
K
=3 for RGB);

-

Principal axis calculated in
O
(
nK
2
)

time

-

Selection of dividing point in
O
(
n

log
n
) time

-

Assume that
largest

cluster is split to equal halves:
n

n
/2

-

Total number of vectors is:

2
...
2
...
4
4
4
4
2
2
M
N
M
N
N
N
N
N
N
N
N
n
i

M
N
M
N
M
N
N
N
log
2
2
...
4
4
2
2

Total time complexity
is
O
(
NK
2

log
M

) +
O
(
N

log
N

)

Splitting experiments

With partition refinement:

265
168
74
261
130

Quality
-
time comparison:

2
0
4
0
6
0
1
6
5
1
7
0
1
8
0
9
4
0
2
6
0
M
S
E
T
i
m
e

(
s
e
c
o
n
d
s
)
R
a
n
d
o
m
S
p
l
i
t
-
1
S
p
l
i
t
-
2
S
+
G
L
A
S
L
R
S
G
L
A
S
L
R
+
G
L
A
R
+
G
L
A
1
7
5
1
0
3
0
5
0
N
e
w

m
e
t
h
o
d
E
x
i
s
t
i
n
g

m
e
t
h
o
d

Merge
-
based approach: PNN algorithm

PNN

Put all vectors in own cluster;

REPEA
T

(
a
,
b
)

SearchClusterPair;

MergeClusters
(a
,
b
);

UNTIL final cluster size reached;

Code vectors:
Training vectors:
Before cluster merge
After cluster merge
Vectors to be merged
Remaining vectors
Training vectors of the clusters to be merged
Other training vectors
S
2
S
3
S
4
S
5
S
1
x
+
x
x
x
x
x
x
x
x
x
x
x
x
x
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
x
x
x
x
x
x
x
x
x
x
x
x
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
x
+

Iterative shrinking

IS
(
X
,
M
)

S

FOR
i

1 to
N

DO

s
i

{
x
i
};

REPEAT

s
a

SearchClusterToBeRemoved(
S
);

Repartitio
nCluster(
S
, s
a
);

UNTIL |S|=
M
;

Code vectors:
Training vectors:
Before cluster removal
After cluster removal
Vector to be removed
Remaining vectors
Training vectors of the cluster to be removed
Other training vectors
S
2
S
3
S
4
S
5
S
1
x
+
+
+
+
+
+
+
+
+
x
x
x
x
x
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
x
x
x
x
x
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

Results using merge
-
approaches

Results using merge
-
approach (PNN)

PNN
After
third
merge
After
fourth
merge
IS

160
162
164
166
168
170
172
174
176
178
180
0
10
20
30
40
50
60
70
80
90
Run time
MSE
PNN (original)
GLA-PNN-GLA (improved)
GA-PNN (improved)
PNN (improved)

Split and merge

Generate an initial codebook by any algorithm.

Repe
at

Select a cluster to be split.

Split the selected cluster.

Select two clusters to be merged

Merge the selected clusters

Until

no improvement achieved.

Split-Merge
Merge-Split
0
M
N
M-h
M+h

Comparison of Split and Merge

180
175
170
160
165
0
100
200
300
600
700
GLA
Split
SLR
PNN
SGLA
SM
SMG
Time (seconds)
MSE

Structure
of
Local Search

Generate initial solution.

REPEAT

Generate a set of new solutions.

Evaluate the new solutions.

Select the best solution.

UNTIL stopping criterion met.

Neighborhood function using random swap:

c
x
j
random
M
i
random
N
j
i

(
,
),
(
,
)
1
1

Object rejec
tion:

p
d
x
c
i
p
j
i
k
M
i
k
i

arg
min
,
1
2

Object attraction
:

p
d
x
c
i
N
i
k
j
k
p
i
k
i

arg
min
,
,
2
1

Randomized local search

RLS

algorithm 1:

C

SelectRandomDataObjects(
M
).

P

OptimalPartition(
C
).

REPEAT
T

times

C
new

RandomSwap(
C
).

P
new

LocalRepartition(
P
,
C
new
).

C
new

OptimalRepresentatives(
P
new
).

IF
f
(
P
new
, C
new
) <
f
(
P, C
) THEN

(
P, C
)

(
P
new
, C
new
)

RLS

algorithm 2:

C

SelectRandomDataObjects(
M
).

P

OptimalPartition(
C
).

REPEAT

T

times

C
new

RandomSwap(
C
).

P
new

LocalRepartition(
P
,
C
new
).

K
-
means(
P
new
,C
new
).

IF
f
(
P
new
, C
new
) <
f
(
P, C
) THEN

(
P, C
)

(
P
new
, C
new
)

Random swap

BEFORE SWAP

Missing clusters
unnecessary clusters

AFTER SWAP

Centroid
removed
Centroid

Local fine
-
tuning

LOCAL REFINEMENT

Obsolete cluster disappears
New cluster appears

AFTER K
-
MEANS

Cluster moves down

Genetic
algorithm

176.53
163.93
163.63
163.51
163.08
150
155
160
165
170
175
180
185
190
K-means
Random
+ RLS
K-means
+ RLS
Splitting
+ RLS
Ward +
RLS
MSE
RLS-2
Bridge
160
165
170
175
180
185
190
0
1000
2000
3000
4000
5000
Iterations
MSE
RLS-1
RLS-2

Structure of
Genetic Algorithm

Genetic algorithm:

Generate
S

initial solutions.

REPEAT
T

times

Generate new solutions.

Sort the solutions.

Store the best solution.

END
-
REPEAT

Output t
he best solution found.

Generate new solutions:

REPEAT
S

times

Select pair for crossover.

Cross the selected solutions.

Mutate the new solution.

Fine
-
tune the new solution by GLA.

END
-
REPEAT

Pseudo code for the GA
(1/2)

CrossSolutions(
C
1
,

P
1
,

C
2
,

P
2
)

(
C
new
,
P
new
)

C
new

CombineCentroids(
C
1
,
C
2
)

P
new

CombinePartitions(
P
1
,
P
2
)

C
new

UpdateCentroids(
C
new
,
P
new
)

RemoveEmptyClusters(
C
new
,
P
new
)

PerformPNN(
C
new
,
P
new
)

CombineCentroids(
C
1
,

C
2
)

C
new

C
new

C
1

C
2

CombinePartitions(
C
new
,

P
1
,

P
2
)

P
new

FOR
i

1 TO
N

DO

IF
x
c
x
c
i
p
i
p
i
i

1
2
2
2

THEN

p
p
i
new
i

1

ELSE

p
p
i
new
i

2

END
-
FOR

Pseudo code for the GA
(2/2)

UpdateCentroids(
C
1
,

C
2
)

C
new

FOR
j

1 TO |
C
new
| DO

c
j
new

CalculateCentroid(
P
n
ew
,

j

)

PerformPNN(
C
new
,
P
new
)

FOR
i

1 TO |
C
new
| DO

q
i

FindNearestNeighbor(
c
i
)

WHILE |
C
new
|>
M

DO

a

FindMinimumDistance(
Q
)

b

q
a

MergeClusters(
c
a
,

p
a
,
c
b
,

p
b
)

UpdatePointers(
Q
)

END
-
WHILE

Combining existing solutions

Performance comp
arison of GA

160
165
170
175
180
0
10
20
30
40
50
Number of iterations
Distortion
Bridge
Mutations + GLA
PNN crossover + GLA
Random crossover + GLA
PNN crossover