Clustering Algorithms

spiritualblurtedAI and Robotics

Nov 24, 2013 (3 years and 8 months ago)

68 views

1
1
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Outline
Part 1: Basic Concepts of data clustering

Non
-
Supervised Learning and Clustering
:
Problem formulation

cluster analysis
:
Taxonomies of Clustering Techniques
:
Data types and Proximity Measures
:
Difficulties and open problems
Part 2: Clustering Algorithms

Hierarchical methods
:
Single
-
link
:
Complete
-
link
:
Clustering Based on Dissimilarity Increments Criteria
From Single Clustering to Ensemble Methods
-
April 2009
2
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Hierarchical Clustering

Use proximity matrix:
n
x
n
:
D(i,j): proximity (similarity or distance) between patterns i and j
Step 0
Step 1
Step 2
Step 3
Step 4
b
d
c
e
a
a b
d e
c d e
a b c d e
Step 4
Step 3
Step 2
Step 1
Step 0
agglomerative
divisive
From Single Clustering to Ensemble Methods
-
April 2009
2
3
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Hierarchical Clustering: Agglomerative Methods
1.
Start with
n
clusters containing one object
2.
Find the most similar pair of clusters
Ci
e
Cj
from the proximity
matrix and merge them into a single cluster
3.
Update the proximity matrix (reduce its order by one, by replacing
the individual clusters with the merged cluster)
4.
Repeat steps (2) e (3) until a single cluster is obtained (i.e. N
-
1
times)
From Single Clustering to Ensemble Methods
-
April 2009
4
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Hierarchical Clustering: Agglomerative Methods

Similarity measures between clusters:
:
Well known similarity measures can be written using the Lance
-
Williams
formula, expressing the distance between cluster
k
and cluster
i+j
,
obtained by the merging of clusters
i
and
j
:
Single
-
link
Complete
-
link
Centroid
Median
(Average link)
Ward’s Method
(minimum variance)




)
,
(
),
,
(
min
,

-0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i










)
,
(
),
,
(
max
,

0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i








)
,
(
)
,
(

0



2
k
j
i
j
i
j
i
j
i
j
j
j
i
i
i
d
k
j
i
d
c
n
n
n
n
b
n
n
n
a
n
n
n
a














b
a
a
j
i
0
c

;

25
.
0

;

5
.
0















j
i
C
b
C
a
j
i
j
i
j
i
j
j
j
i
i
i
b
a
d
n
n
C
C
d
c
b
n
n
n
a
n
n
n
a
,
)
,
(
1
)
,
(

0


0
















c
n
n
n
n
b
n
n
n
n
n
a
n
n
n
n
n
a
j
i
k
k
j
i
k
j
k
j
j
i
k
i
k
i
)
,
(
)
,
(
)
,
(
)
,
(
)
,
(
)
,
(
k
j
d
k
i
d
c
j
i
bd
k
j
d
a
k
i
d
a
k
j
i
d
j
i






From Single Clustering to Ensemble Methods
-
April 2009
3
5
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Hierarchical Clustering: Agglomerative Methods
Single
-
link
Complete
-
link
Centroid
Median
(Average link)
Ward’s Method
(minimum variance)




)
,
(
),
,
(
min
,

-0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i










)
,
(
),
,
(
max
,

0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i








)
,
(
)
,
(

0



2
k
j
i
j
i
j
i
j
i
j
j
j
i
i
i
d
k
j
i
d
c
n
n
n
n
b
n
n
n
a
n
n
n
a














b
a
a
j
i
0
c

;

25
.
0

;

5
.
0















j
i
C
b
C
a
j
i
j
i
j
i
j
j
j
i
i
i
b
a
d
n
n
C
C
d
c
b
n
n
n
a
n
n
n
a
,
)
,
(
1
)
,
(

0


0
















c
n
n
n
n
b
n
n
n
n
n
a
n
n
n
n
n
a
j
i
k
k
j
i
k
j
k
j
j
i
k
i
k
i
Single Link
: Distance between two clusters
is the distance between the closest points.
Also called “neighbor joining.”
From Single Clustering to Ensemble Methods
-
April 2009
6
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Hierarchical Clustering: Agglomerative Methods
Single
-
link
Complete
-
link
Centroide
Median
(Average link)
Ward’s Method
(minimum variance)




)
,
(
),
,
(
min
,

-0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i










)
,
(
),
,
(
max
,

0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i








)
,
(
)
,
(

0



2
k
j
i
j
i
j
i
j
i
j
j
j
i
i
i
d
k
j
i
d
c
n
n
n
n
b
n
n
n
a
n
n
n
a














b
a
a
j
i
0
c

;

25
.
0

;

5
.
0















j
i
C
b
C
a
j
i
j
i
j
i
j
j
j
i
i
i
b
a
d
n
n
C
C
d
c
b
n
n
n
a
n
n
n
a
,
)
,
(
1
)
,
(

0


0
















c
n
n
n
n
b
n
n
n
n
n
a
n
n
n
n
n
a
j
i
k
k
j
i
k
j
k
j
j
i
k
i
k
i
Complete Link
: Distance between clusters
is distance between farthest pair of points.
From Single Clustering to Ensemble Methods
-
April 2009
4
7
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Hierarchical Clustering: Agglomerative Methods
Single
-
link
Complete
-
link
Centroide
Median
(Average link)
Ward’s Method
(minimum variance)




)
,
(
),
,
(
min
,

-0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i










)
,
(
),
,
(
max
,

0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i








)
,
(
)
,
(

0



2
k
j
i
j
i
j
i
j
i
j
j
j
i
i
i
d
k
j
i
d
c
n
n
n
n
b
n
n
n
a
n
n
n
a














b
a
a
j
i
0
c

;

25
.
0

;

5
.
0















j
i
C
b
C
a
j
i
j
i
j
i
j
j
j
i
i
i
b
a
d
n
n
C
C
d
c
b
n
n
n
a
n
n
n
a
,
)
,
(
1
)
,
(

0


0
















c
n
n
n
n
b
n
n
n
n
n
a
n
n
n
n
n
a
j
i
k
k
j
i
k
j
k
j
j
i
k
i
k
i
Centroid
: Distance between clusters is
distance between centroids.
From Single Clustering to Ensemble Methods
-
April 2009
8
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Hierarchical Clustering: Agglomerative Methods
Single
-
link
Complete
-
link
Centroid
Median
(Average link)
Ward’s Method
(minimum variance)




)
,
(
),
,
(
min
,

-0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i










)
,
(
),
,
(
max
,

0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i








)
,
(
)
,
(

0



2
k
j
i
j
i
j
i
j
i
j
j
j
i
i
i
d
k
j
i
d
c
n
n
n
n
b
n
n
n
a
n
n
n
a














b
a
a
j
i
0
c

;

25
.
0

;

5
.
0















j
i
C
b
C
a
j
i
j
i
j
i
j
j
j
i
i
i
b
a
d
n
n
C
C
d
c
b
n
n
n
a
n
n
n
a
,
)
,
(
1
)
,
(

0


0
















c
n
n
n
n
b
n
n
n
n
n
a
n
n
n
n
n
a
j
i
k
k
j
i
k
j
k
j
j
i
k
i
k
i
Average Link
: Distance between clusters is
average distance between the cluster points.
From Single Clustering to Ensemble Methods
-
April 2009
5
9
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Ward’s Link
: Minimizes the sum
-
of
-
squares
criterion (measure of heterogenity)
Hierarchical Clustering: Agglomerative Methods
Single
-
link
Complete
-
link
Centroid
Median
(Average link)
Ward’s Method
(minimum variance)




)
,
(
),
,
(
min
,

-0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i










)
,
(
),
,
(
max
,

0.5
c

;

0

;

5
.
0
k
j
d
k
i
d
k
j
i
d

b
a
a
j
i








)
,
(
)
,
(

0



2
k
j
i
j
i
j
i
j
i
j
j
j
i
i
i
d
k
j
i
d
c
n
n
n
n
b
n
n
n
a
n
n
n
a














b
a
a
j
i
0
c

;

25
.
0

;

5
.
0















j
i
C
b
C
a
j
i
j
i
j
i
j
j
j
i
i
i
b
a
d
n
n
C
C
d
c
b
n
n
n
a
n
n
n
a
,
)
,
(
1
)
,
(

0


0
















c
n
n
n
n
b
n
n
n
n
n
a
n
n
n
n
n
a
j
i
k
k
j
i
k
j
k
j
j
i
k
i
k
i
From Single Clustering to Ensemble Methods
-
April 2009
10
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Single Linkage:




)
,
(
min
,
,
b
a
d
C
C
d
j
i
C
b
C
a
j
i



x
y
1
4
4
2
8
4
3
15
8
4
24
4
5
24
12
1
2
3
4
5
1
-
4
11.7
20
21.5
2
4
-
8.1
16
17.9
3
11.7
8.1
-
9.8
9.8
4
20
16
9.8
-
8
5
21.5
17.9
9.8
8
-
1
2
3
5
4
1
2
3
5
4
1,2
3
4
5
1,2
-
8.1
16
17.9
3
8.1
-
9.8
9.8
4
16
9.8
-
8
5
17.9
9.8
8
-
1
2
3
5
4
From Single Clustering to Ensemble Methods
-
April 2009
6
11
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Single Linkage:
1,2
3
4,5
1,2
-
8.1
16
3
8.1
-
9.8
4,5
16
9.8
-
1
2
3
5
4
1,2,3
4,5
1,2,3
-
9.8
4,5
9.8
-
1
2
3
5
4

Dendrogram




)
,
(
min
,
,
b
a
d
C
C
d
j
i
C
b
C
a
j
i



From Single Clustering to Ensemble Methods
-
April 2009
12
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Complete
-
Link:




)
,
(
max
,
,
b
a
d
C
C
d
j
i
C
b
C
a
j
i



1
2
3
4
5
1
-
4
11.7
20
21.5
2
4
-
8.1
16
17.9
3
11.7
8.1
-
9.8
9.8
4
20
16
9.8
-
8
5
21.5
17.9
9.8
8
-
1
2
3
5
4
1
2
3
5
4
1,2
3
4
5
1,2
-
11.7
20
21.5
3
11.7
-
9.8
9.8
4
20
9.8
-
8
5
21.5
9.8
8
-
1
2
3
5
4
1,2
3
4,5
1,2
-
11.7
21.5
3
11.7
-
9.8
4,5
21.5
9.8
-
From Single Clustering to Ensemble Methods
-
April 2009
7
13
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Complete
-
Link:




)
,
(
max
,
,
b
a
d
C
C
d
j
i
C
b
C
a
j
i



1
2
3
5
4
1,2,3
4,5
1,2,3
-
21.5
4,5
21.5
-

Dendrogram
From Single Clustering to Ensemble Methods
-
April 2009
14
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
1
2
3
5
4
Single
-
link
1
2
3
5
4
Complete
-
link
Single
-
link and Complete
-
Link
:
A clustering of the data objects is obtained by cutting the dendrogram at
the desired level, then each connected component forms a cluster.
Favours connectedness
Favours compactness
From Single Clustering to Ensemble Methods
-
April 2009
8
15
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Single
-
link and Complete
-
Link

SL algorithm:
:
Favors
connectedness
Equivalent to building a MST
And cutting at weak links
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
From Single Clustering to Ensemble Methods
-
April 2009
16
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Single
-
link and Complete
-
Link

SL algorithm:
:
Favors connectedness
:
Detects arbitrary
-
shaped clusters with even densities
:
Cannot handle distinct density clusters

.
Single
-
link method, th=0.49

From Single Clustering to Ensemble Methods
-
April 2009
9
17
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
-5
0
5
10
-4
-2
0
2
4
-5
0
5
10
-4
-2
0
2
4
Single
-
link and Complete
-
Link

SL algorithm:
:
Favors connectedness
:
Detects arbitrary
-
shaped clusters with even densities
:
Cannot handle distinct density clusters
:
Is sensitive to in
-
between patterns
-5
0
5
10
-4
-2
0
2
4
From Single Clustering to Ensemble Methods
-
April 2009
18
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Single
-
link and Complete
-
Link

SL algorithm:
:
Favors connectedness
:
Detects arbitrary
-
shaped clusters with even densities
:
Cannot handle distinct density clusters
:
Is sensitive to in
-
between patterns
:
Needs criteria to set the final number of clusters

CL algorithm:
:
Favors compactness
-5
0
5
10
-4
-2
0
2
4
-5
0
5
10
-4
-2
0
2
4
From Single Clustering to Ensemble Methods
-
April 2009
10
19
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Single
-
link and Complete
-
Link

SL algorithm:
:
Favors connectedness
:
Detects arbitrary
-
shaped clusters with even densities
:
Cannot handle distinct density clusters
:
Is sensitive to in
-
between patterns
:
Needs criteria to set the final number of clusters

CL algorithm:
:
Favors compactness
:
Imposes spherical
-
shaped clusters on data
-2
-1
0
1
2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2
-1
0
1
2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
From Single Clustering to Ensemble Methods
-
April 2009
20
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Single
-
link and Complete
-
Link

SL algorithm:
:
Favors connectedness
:
Detects arbitrary
-
shaped clusters with even densities
:
Cannot handle distinct density clusters
:
Is sensitive to in
-
between patterns
:
Needs criteria to set the final number of clusters

CL algorithm:
:
Favors compactness
:
Imposes spherical
-
shaped clusters on data
:
Needs criteria to set the final number of clusters
From Single Clustering to Ensemble Methods
-
April 2009
11
21
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Hierarchical Clustering

Weakness
:
do not scale well: time complexity of
O(n
2
),
where
n
is the number
of total objects
:
can never undo what was done previously

Integration of hierarchical with distance
-
based clustering
:
BIRCH (Zhang, Ramakrishnan, Livny, 1996): uses a
ClusteringFeature
-
tree and incrementally adjusts the quality of sub
-
clusters
:
CURE (Guha, Rastogi & Shim, 1998): selects well
-
scattered points
from the cluster and then shrinks them towards the center of the
cluster by a specified fraction
:
CHAMELEON (
G. Karypis, E.H. Han and V. Kumar, 1999)
:
hierarchical clustering using dynamic modeling
1. Use a graph partitioning algorithm: cluster objects into a large
number of relatively small sub
-
clusters
2. Use an agglomerative hierarchical clustering algorithm: find the
genuine clusters by repeatedly combining these sub
-
clusters
From Single Clustering to Ensemble Methods
-
April 2009
22
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Smoothness
Hy
pothesis:
:
A cluster is a set of patterns sharing important characteristics in a
given context
:
A dissimilarity measure encapsulates the notion of pattern
resemblance
:
Higher resemblance patterns are more likely to belong to the same
cluster and should be associated first
:
Dissimilarity between neighboring patterns within a cluster should
not occur with abrupt changes
:
The merging of well separated clusters results in abrupt changes
in dissimilarity values
A Fred, J Leitão, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE PAMI, 2003.
From Single Clustering to Ensemble Methods
-
April 2009
12
23
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Dissimilarity Increments:
Dissimilarity increment:










,,- nearest neighbors
:argmin,,
:argmin,,
i j k
j l i
l
k l j
l
x x x
x j d x x l i
x k d x x l i
 
 






,,,,
inc i j k i j j k
d x x x d x x d x x
 
i
x
j
x
k
x


,
i j
d x x


,
i j
d x x
inc
d
From Single Clustering to Ensemble Methods
-
April 2009
24
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:
:
Uniformly distributed data


From Single Clustering to Ensemble Methods
-
April 2009
13
25
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred


Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:
:
2D Gaussian data
From Single Clustering to Ensemble Methods
-
April 2009
26
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:
:
Ring
-
shaped data

From Single Clustering to Ensemble Methods
-
April 2009
14
27
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:
:
Directional expanding data

From Single Clustering to Ensemble Methods
-
April 2009
28
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:
:
Exponential distribution:
( ) exp
x
p x





From Single Clustering to Ensemble Methods
-
April 2009
15
29
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Exponential distribution:
:
Higher density patterns
-
> higher

:
Well separated clusters
-
>
d
inc
on the tail of
p(x)


From Single Clustering to Ensemble Methods
-
April 2009
30
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Gap
between clusters:




,
i i j t i
gap d C C d C
 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
d
t
(C
i
)
d(C
i
,C
j
)
d
t
(C
j
)
gap
j
d
t
(C
j
)
d
t
(C
i
)
gap
i
C
i
C
j
From Single Clustering to Ensemble Methods
-
April 2009
16
31
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Gap
between clusters:




,
i i j t i
gap d C C d C
 

C
i
C
j
gap
i
gap
j
d
t
(C
i
)
d
t
(C
j
)
d
(C
i
,C
j
)
From Single Clustering to Ensemble Methods
-
April 2009
32
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Cluster Isolation criterion:
Let
C
i
,
C
k
be
two
clusters
which
are
candidates
for
merging,
and
let

i
,

k
be
the
respective
mean
values
of
the
dissimilarity
increments
in
each
cluster
.
Compute
the
increments
for
each
cluster,
gap
i
and
gap
k
.
If
gap
i



i
(
gap
k



k
),
isolate
cluster
C
i
(
C
k
)
and
proceed
the
clustering
strategy
with
the
remaining
patterns
.
If
neither
cluster
exceeds
the
gap
limit,
merge
them
.
From Single Clustering to Ensemble Methods
-
April 2009
17
33
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Setting the Isolation Criterion Parameter

:
:
Result: the crossings of the tangential line, at points which are multiple of
the distribution mean value,

/

, with the
x
axis, is given by (

+1)/

:

3
,
5

cover
the
significant
part
of
the
distribution

exponential distribution,

=.05
tangential lines at points multiple of

=.05
From Single Clustering to Ensemble Methods
-
April 2009
34
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Hierarchical Clustering Algorithm:
:
A statistic of the dissimilarity increments within a cluster is maintained and
updated during cluster merging
:
Clusters are obtained by comparing dissimilarity increments with a dynamic
threshold,

i
, based on cluster statistics
From Single Clustering to Ensemble Methods
-
April 2009
18
35
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Results:
:
Ring
-
Shaped Clusters

From Single Clustering to Ensemble Methods
-
April 2009
36
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Results:
:
Ring
-
Shaped Clusters
.
Single
-
link method, th=0.49


From Single Clustering to Ensemble Methods
-
April 2009
19
37
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Results:
:
Ring
-
Shaped Clusters
.
Dissimilarity Increments
-
base method:

d1
d2

d1
d2
d2
g1
g2
3/


3/

1
3/

3
From Single Clustering to Ensemble Methods
-
April 2009
38
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred

Clustering Based on Dissimilarity Increments Criteria
Results:
:
Ring
-
Shaped Clusters
.
Dissimilarity Increments
-
base method:

d1
d2
d2
g1
g2
3/


3/

1
3/

3
From Single Clustering to Ensemble Methods
-
April 2009
20
39
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria
Results:
:
2
-
D Patterns with Complex Structure
Dissimilarity Increments
-
base method Single
-
link

2 <

< 9

Si ngle Link
th=1.1
From Single Clustering to Ensemble Methods
-
April 2009
40
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering of Contour Images
:
The data set is composed by 634 contour images of 15 types of hardware
tools: t1 to t15.
:
When counting each pose as a distinct sub
-
class in the object type, we
obtain a total of 24 classes.
From Single Clustering to Ensemble Methods
-
April 2009
21
41
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering of Contour Images

Contour extraction
:
the object boundary is sampled at 50 equally spaced points
:
the angle between consecutive segments is quantized in 8 levels.
Contour
Extraction
String Contour
Description
Based on a
thresholding
method
8 directional
differential
chain code
From Single Clustering to Ensemble Methods
-
April 2009
42
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering of Contour Images
From Single Clustering to Ensemble Methods
-
April 2009
22
43
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering of Contour Images
t1
t2
t3
t4
t5

t1
t2
t3
t4
t5
Single
-
link
Dissimilarity Increments
-
based method
(string
-
edit distance)
From Single Clustering to Ensemble Methods
-
April 2009
44
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Clustering Based on Dissimilarity Increments Criteria

Description:
:
Hierarchical agglomerative algorithm adopting a cluster isolation
criterion based on dissimilarity increments

Strength
:
Method is not conditioned by a particular dissimilarity measure
(examples used Euclidean and string
-
edit distances)
:
Ability to identify arbitrary shaped and sized clusters
:
The number of clusters is intrinsically found

Weakness
:
Sensitive to in
-
between points connecting touching clusters
From Single Clustering to Ensemble Methods
-
April 2009
23
45
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Outline

Partitional Methods
:
K
-
Means
:
Spectral Clustering
:
EM
-
based Gaussian Mixture Decomposition
Part 3.: Validation of clustering solutions

Cluster Validity Measures
Part 4.:
Ensemble Methods

Evidence Accumulation Clustering
From Single Clustering to Ensemble Methods
-
April 2009
46
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Partitional Methods

K
-
Means
:
Minimizes the cost function:
:
Algorithm:
.
Input:
k
, the number of clusters; data set
1.
Randomly select
k
seed points from the data set, and take them as initial
centroids
2.
Partition the data into
k
clusters by assigning each object to the cluster with
the nearest centroid.
3.
Compute centroids of the clusters of the current partition. The centroid is
the center (mean point) of the cluster.
4.
Go back to step 2 or stop when no more new assignment.
From Single Clustering to Ensemble Methods
-
April 2009
24
47
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Partitional Methods: K
-
Means

Favors compactness
-5
0
5
10
-4
-2
0
2
4
-5
0
5
10
-4
-2
0
2
4
From Single Clustering to Ensemble Methods
-
April 2009
48
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Partitional Methods: K
-
Means

Favors compactness

Strength
:
Fast algorithm (
O(tkn
)

t
is the number of iterations; normally,
k
,
t
<<
n
.
)
:
Scalability
:
Often terminates at a local optimum.

Weakness
:
Imposes spherical
-
shaped clusters
K
-
means clustering of uniform data (k=4)
From Single Clustering to Ensemble Methods
-
April 2009
25
49
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Partitional Methods: K
-
Means

Favors compactness

Strength
:
Fast algorithm (
O(tkn
)

t
is the number of iterations; normally,
k
,
t
<<
n
.
)
:
Scalability
:
Often terminates at a local optimum.

Weakness
:
Imposes spherical
-
shaped clusters
:
Is sensitive to the number of objects in clusters
0
2
4
6
8
10
0
2
4
6
8
10
0
2
4
6
8
10
0
2
4
6
8
10
From Single Clustering to Ensemble Methods
-
April 2009
50
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Partitional Methods: K
-
Means

Favors compactness

Strength
:
Fast algorithm (
O(tkn
)

t
is the number of iterations; normally,
k
,
t
<<
n
.
)
:
Scalability
:
Often terminates at a local optimum.

Weakness
:
Imposes spherical
-
shaped clusters
:
Is sensitive to the number of objects in clusters
:
Dependence on initialization
From Single Clustering to Ensemble Methods
-
April 2009
26
51
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Favors compactness

Strength
:
Fast
:
Scalability
:
Robust to noise

Weakness
:
Imposes spherical
-
shaped clusters
:
Is sensitive to the number of objects in clusters
:
Dependence on initialization
0
2
4
6
8
10
0
2
4
6
8
10
0
2
4
6
8
10
0
2
4
6
8
10
0
2
4
6
8
10
0
2
4
6
8
10
From Single Clustering to Ensemble Methods
-
April 2009
52
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Partitional Methods: K
-
Means

Favors compactness

Strength
:
Fast algorithm (
O(tkn
)

t
is the number of iterations; normally,
k
,
t
<<
n
.
)
:
Scalability
:
Often terminates at a local optimum.

Weakness
:
Imposes spherical
-
shaped clusters
:
Is sensitive to the number of objects in clusters
:
Dependence on initialization
:
Needs criteria to set the final number of clusters
:
Applicable only when
mean
is defined (what about categorical
data?)
From Single Clustering to Ensemble Methods
-
April 2009
27
53
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Variations of K
-
Means Method

K
-
Means(MacQueen’67): each cluster is represented by the center
of the cluster

A few variants of the k
-
means which differ in
:
Selection of the initial k means
:
Dissimilarity calculations (Mahalanobis distance
-
> elliptic clusters)
:
Strategies to calculate cluster means
.
Medoid
-
each cluster is represented by one of the objects in the
cluster
:
Fuzzy version:
Fuzzy K
-
Means

Handling categorical data:
k
-
modes
(Huang’98)
:
Replacing means of clusters with
modes
:
Using new dissimilarity measures to deal with categorical objects
:
Using a
frequency
-
based method to update modes of clusters
From Single Clustering to Ensemble Methods
-
April 2009
54
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering

For a given data set,
X
, Spectral Clustering finds a set of data clusters on
the basis of spectral analysis of a similarity graph

The clustering problem is defined in terms of a complete graph, G, with
vertices
V
={
1, …, N
}, corresponding to the data points in the data set, and
each edge between two vertices is weighted by the similarity between them.

The weight matrix is also called the
affinity matrix
or the similarity matrix.


2
2
2

j
i
x
x
ij
e
A


Gaussian Kernel
From Single Clustering to Ensemble Methods
-
April 2009
28
55
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering

Cutting edges of
G
we obtain disjoint
subgraphs
of
G as the clusters of X

The goal of clustering is to organize the dataset into disjoint subsets with
high intra
-
cluster similarity and low inter
-
cluster similarity

The resulting clusters should be as
compact
and
isolated
as possible
From Single Clustering to Ensemble Methods
-
April 2009
56
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering

The
graph
partitioning
for
data
clustering
can
be
interpreted
as
a
minimization
problem
of
an
objective
function,
in
which
the
compactness
and
isolation
are
quantified
by
the
subset
sums
of
edge
weights
.

Common objective functions

Ratio cut (
Rcut
)

Normalised
cut (
Ncut
)

Min
-
max
cut
(
Mcut
)

cut(
A, B) is the sum of the edge weights between

p

A
and

p

B

P
\
C
l
is the complement of
C
l

X,

card
C
l
denotes the number of points
in
C
l









k
l
l
l
l
l
k
k
l
l
l
l
k
k
l
l
l
l
k
C
C
C
X
C
C
C
X
C
C
X
C
C
C
C
C
X
C
C
C
1
1
1
1
1
1
)
,
cut(
)
\
,
cut(
:
)
,
,
Mcut(
)
,
cut(
)
\
,
cut(
:
)
,
,
Ncut(

card
)
\
,
cut(
:
)
,
,
Rcut(



From Single Clustering to Ensemble Methods
-
April 2009
29
57
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering

The
solution
of
the
minimization
problem
of
any
of
the
previous
objective
functions
is
obtained
from
the
matrix
of
the
first
k
eigenvectors
of
a
matrix
derived
from
the
affinity
matrix
(
Laplacian
matrix)

The
eigenvectors
for
Ncut
and
Mcut
are
identical,
and
obtained
from
the
symmetrical
Laplacian

Another
common
choice
is

Distinct algorithms differ on the way of producing and using the
eigenvectors and how to derive clusters from them.
:
Some use each eigenvector one at a time
:
Other, use top k eigenvectors simultaneously

Closely related with spectral graph partitioning, in which the second
eigenvector of a graph’s
Laplacian
is used to define a semi
-
optimal cut; the
second eigenvector solves a relaxation of an NP
-
hard discrete graph
partitioning problem, giving an approximation to the optimal cut.
2
/
1
2
/
1
:




AD
D
L
L
sym

D
is
a diagonal
matrix
whose
ii
-
th
entry
is
the
sum
of
the
i
-
th
row
of
A


A
D
D
L
L
rw




1
:
From Single Clustering to Ensemble Methods
-
April 2009
58
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering
(Ng et al, 2001)

Maps the feature space into a new space,
Y
, based on the
eigenvectors of a matrix derived from an affinity matrix associated
with the data set.

The data partition is obtained by applying the K
-
means algorithm
on the new space.
(NG. and Al. 2001)


2
2
2

j
i
x
x
ij
e
A


Original feature
space
Eigen vector
Feature space
A. Y. Ng and M. I. Jordan and Y. Weiss, On Spectral Clustering: Analysis and an algorithm, NIPS 2001
From Single Clustering to Ensemble Methods
-
April 2009
30
59
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering
(Ng et al, 2001)

Algorithm:
From Single Clustering to Ensemble Methods
-
April 2009
60
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering
(Ng et al, 2001)
From Single Clustering to Ensemble Methods
-
April 2009
31
61
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering
(Ng et al, 2001)

Results strongly depend on parameter values:
k
e σ
K=2,
σ
=0.1
K=2,
σ
=0.4
From Single Clustering to Ensemble Methods
-
April 2009
62
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering
(Ng et al, 2001)

Results strongly depend on parameter values:
k
e σ
K=2,
σ
=0.1
K=2,
σ
=0.4
From Single Clustering to Ensemble Methods
-
April 2009
32
63
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering
(Ng et al, 2001)

Results strongly depend on parameter values:
k
e σ
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-1.5
-1
-0.5
0
0.5
1
1.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-1.5
-1
-0.5
0
0.5
1
1.5
K=3,
σ
=0.1
K=3,
σ
=0.45
From Single Clustering to Ensemble Methods
-
April 2009
64
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering
(Ng et al, 2001)

Results strongly depend on parameter values:
k
e σ
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-1.5
-1
-0.5
0
0.5
1
1.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-1.5
-1
-0.5
0
0.5
1
1.5
K=3,
σ
=0.1
K=3,
σ
=0.45
From Single Clustering to Ensemble Methods
-
April 2009
33
65
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering
Selection of parameter values:
MSE:
Eigengap
Rcut








k
i
n
j
i
j
i
Y
i
m
y
N
MSE
1
1
2
1
1
2
1
)
(





A












i
j
ij
K
k
K
k
l
l
S
j
S
j
jk
K
A
A
Rcut
k
l
1
,
1
From Single Clustering to Ensemble Methods
-
April 2009
66
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Selection of Parameters:
Global Results on selecting
σ
and K
None of the studied methods is
suitable for the automatic
selection of the spectral
clustering parameters
A majority voting decision did not
significantly improve the results
Percentage of correct classification:
σ
From Single Clustering to Ensemble Methods
-
April 2009
34
67
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Spectral Clustering

Strength
:
Detects arbitrary
-
shaped clusters.
:
By using an adequate similarity measure between patterns, can be
applied to all types of data

Weakness
:
Computationally heavy
:
Needs criteria to set the final number of clusters and scaling factor
From Single Clustering to Ensemble Methods
-
April 2009
68
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Model
-
Based Clustering: Finite Mixtures

k
random sources, with probability density functions
f
i
(x), i=1,…,k
f
1
(x)
f
2
(x)
f
i
(x)
f
k
(x)
Choose at random
X
random variable
f (x|source i) = f
i
(x)
Conditional:
f (x
and
source i) = f
i
(x)

i
Joint:
f(x) =
Unconditional:
f (x and source i)

sources
all




k
1
i
i
i
)
x
(
f
From Single Clustering to Ensemble Methods
-
April 2009
35
69
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Model
-
Based Clustering: Finite Mixtures

each component models one cluster

clustering = mixture fitting
f
1
(x)
f
2
(x)
f
3
(x)
k
i i
i=1
f(x|
Θ) = α f(x|θ )

From Single Clustering to Ensemble Methods
-
April 2009
70
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Gaussian Mixture Decomposition

Mixture Model:
:
.
.
k
i i
i=1
f(x|
Θ) = α f(x|θ )

Component densities
Mixing probabilities:




k
1
i
i
1
i
α 0

and
f ( | )
i
x

Gaussian
Arbitrary covariances:
)
,
|
x
(
N
)
|
x
(
f
i
i
i
C
μ


}
,...
,
,
,...,
,
,
,...,
,
{
1
k
2
1
k
2
1
k
2
1






C
C
C
μ
μ
μ
Common covariance:
)
,
|
x
(
N
)
|
x
(
f
i
i
C
μ


}
,...
,
,
,
,...,
,
{
1
k
2
1
k
2
1






C
μ
μ
μ
From Single Clustering to Ensemble Methods
-
April 2009
36
71
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Mixture Model Fitting

n independent observations

Mixture density model:

Estimate

that maximizes (log)likelihood (ML estimate of

):
k
i i
i=1
f(x|
Θ) = α f(x|θ )

}
x
,...,
x
,
x
{
)
n
(
)
2
(
)
1
(

x
)
,
x
(
L
max
arg
ˆ











n
1
j
k
1
i
i
)
j
(
i
)
|
(
f
log
x
mixture





n
1
j
)
j
(
)
|
x
(
f
log
)
,
x
(
L
From Single Clustering to Ensemble Methods
-
April 2009
72
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Gaussian Mixture Model Fitting

Problem: the likelihood function is unbounded as
.
There is no global maximum
.
Unusual goal: a “good” local maximum

Example: a 2
-
component Gaussian mixture
some data points
0
)
det(
i

C
2
)
x
(
σ
2
)
x
(
2
2
2
1
2
2
2
1
2
1
e
2
1
e
σ
2
)
,
σ
,
μ
,
μ
|
x
(
f














}
x
,...,
x
,
x
{
n
2
1
























n
2
j
2
)
x
(
2
log(...)
e
2
1
σ
2
log
)
,
x
(
L
2
2
1
0
σ
as
,
2



1
1
x
μ

From Single Clustering to Ensemble Methods
-
April 2009
37
73
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Mixture Model Fitting

ML estimate has no closed
-
form solution

Standard alternative: expectation
-
maximization (EM) algorithm:
:
Missing data problem:
.
Observed data:
}
x
,...,
x
,
x
{
)
n
(
)
2
(
)
1
(

x
From Single Clustering to Ensemble Methods
-
April 2009
74
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Mixture Model Fitting

ML estimate has no closed
-
form solution

Standard
alternative
:
expectation
-
maximization
(EM)
algorithm
:
:
Missing
data
problem
:
.
Observed
data:
.
Missing
data:
Missing
labels
(“
colors
”)
“1”
at
position
i x
( j)
generated
by
component
i
.
Complete
log
-
likelihood
function
:
}
x
,...,
x
,
x
{
)
n
(
)
2
(
)
1
(

x
}
,...,
,
{
)
n
(
)
2
(
)
1
(
z
z
z
z





T
)
j
(
k
)
j
(
2
)
j
(
1
)
j
(
0
...
0
1
0
...
0
,
z
...,
,
z
,
z


z











n
1
j
k
1
i
i
)
j
(
i
i
)
j
(
i
c
)
|
(
f
log
z
)
,
,
(
L
x
z
x
)
|
z
,
x
(
f
log
)
j
(
)
j
(

From Single Clustering to Ensemble Methods
-
April 2009
38
75
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
The EM Algorithm

The E
-
step:
compute the expected value of

The M
-
step: update parameter estimates
)
,
,
(
L
c

z
x
)
ˆ
,
(
Q
]
ˆ
,
x
|
)
,
,
(
L
[
E
)
t
(
)
t
(
c





z
x
)
ˆ
,
(
Q
max
arg
ˆ
)
t
(
)
1
t
(






From Single Clustering to Ensemble Methods
-
April 2009
76
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
EM
-
Algorithm for Mixture of Gaussian

Iterative procedure:

The E
-
step:

The M
-
step:
,...
ˆ
,
ˆ
,...,
ˆ
,
ˆ
)
1
t
(
)
t
(
)
1
(
)
0
(





( ) ( )
(,)
( ) ( )
1
ˆ
ˆ
( | )
ˆ
ˆ
( | )
j t
j t
i i
i
k
j t
n n
n
f x
w
f x
 
 




)
t
,
j
(
i
w
Estimate, at iteration t, of the probability that x
( j)
was
produced by component i





n
1
j
)
t
,
j
(
i
)
1
t
(
i
w
n
1
ˆ







n
1
j
)
t
,
j
(
i
n
1
j
)
j
(
)
t
,
j
(
i
)
1
t
(
i
w
x
w
ˆ












n
1
j
)
t
,
j
(
i
n
1
j
T
)
1
t
(
i
)
j
(
)
1
t
(
i
)
j
(
)
t
,
j
(
i
)
1
t
(
i
w
)
ˆ
x
(
)
ˆ
x
(
w
ˆ
C
From Single Clustering to Ensemble Methods
-
April 2009
39
77
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Mixture Gaussian Decomposition: Model Selection

How many components?
:
The maximized likelihood never decreases when k increases
:
:
Usually:
:
Criteria in this cathegory:
.
Minimum description length (MDL), Rissanen and Ristad, 1992.
.
Akaike’s information criterion (AIC), Whindham and Cutler, 1992.
.
Schwarz
´
s Bayesian inference criterion (BIC), Fraley and Raftery,
1998.
:
Resampling
-
based techniques
.
Bootstrap for clustering, Jain and Moreau, 1987.
.
Bootstrap for Gaussian mixtures, McLachlan, 1987.
.
Cross validation, Smyth, 1998.


max
min
min
)
k
(
k
,...,
1
k
,
k
k
),
ˆ
(
min
arg
k
ˆ




C
)
ˆ
(
P
)
ˆ
,
(
L
)
ˆ
(
)
k
(
)
k
(
)
k
(






x
C
From Single Clustering to Ensemble Methods
-
April 2009
78
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Mixture Gaussian Decomposition: Model Selection

Given

(k) ,
shortest code
-
length for x (Shannon’s):

MDL criterion: parameter code length
:
Total code
-
length (two part code):
:
MDL criterion:
)
|
(
f
log
)
|
(
L
)
k
(
)
k
(




x
x
)
(
L
)
|
(
f
log
)
,
(
L
)
k
(
)
k
(
)
k
(






x
x
Parameter
code
-
length


)
(
L
)
|
(
f
log
min
arg
ˆ
)
k
(
)
k
(
)
k
(
)
k
(







x
L(each component of

(k)
) =
)
'
n
log(
2
1

'
n
Amount of data from which the parameter is estimated
From Single Clustering to Ensemble Methods
-
April 2009
40
79
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Mixture Gaussian Decomposition: Model Selection

Classical MDL: n’ = n

Mixtures MDL (MMDL) (Figueiredo, 2002)
:
Using EM and redefining the M
-
Step












)
n
log(
2
k
)
|
(
f
log
min
arg
ˆ
)
k
(
)
k
(
)
k
(
x

















k
1
m
m
p
p
)
k
(
)
k
(
)
log(
2
N
)
n
log(
2
)
1
N
(
k
)
|
(
f
log
min
arg
ˆ
)
k
(
x














2
N
w
p
n
1
j
)
t
,
j
(
i
i
y
y
+







k
1
m
m
i
)
1
t
(
i
ˆ
This M
-
step may annihilate components
M. Figueiredo and A. K. Jain, Unsupervised Learning of Finite Mixture Models, IEEE TPAMI, 2002
Np is the number of parameters
of each component.
Gaussian, arbitrary covariances
Np = d + d(d+1)/2
Gaussian, common covariance:
Np = d
From Single Clustering to Ensemble Methods
-
April 2009
80
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Gaussian Mixture Decomposition

EM MMDL
Examples
From Single Clustering to Ensemble Methods
-
April 2009
41
81
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Gaussian Mixture Decomposition

EM MMDL
Examples
From Single Clustering to Ensemble Methods
-
April 2009
82
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Gaussian Mixture Decomposition

Strength
:
Model
-
based approach
:
Good for Gaussian data
:
Handles touching clusters

Weakness
:
Unable to detect arbitrary shaped clusters
:
Dependence on initialization
From Single Clustering to Ensemble Methods
-
April 2009
42
83
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
Gaussian Mixture Decomposition
It is a local (greedy) algorithm (likelihood never decreases)
=>
Initialization dependent
7
4
iterations
270 iterations
From Single Clustering to Ensemble Methods
-
April 2009
84
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
--
Ana Fred
References
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
A.K. Jain and M. N. Murty and P.J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol 31. No
3,pp 264
-
323, 1999.
Data Mining: Concepts and Techniques, J. Han and M. Kamber, Morgan Kaufmann Pusblishers, 2001.
A. Y. Ng and M. I. Jordan and Y. Weiss, On Spectral Clustering: Analysis and an algorithm, in Advances in
Neural Information Processing Systems 14, T. G. Dietterich and S. Becker and Z. Ghahramani, MIT
Press, 2002,
M. Figueiredo and A. K. Jain, Unsupervised Learning of Finite Mixture Models, IEEE TPAMI, 2002
A. L. N. Fred, J. M. N. Leitão, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE
Trans. On Pattern Analysis and Machine Intelligence, vol 25, NO. 8, pp 2003.
From Single Clustering to Ensemble Methods
-
April 2009