Discerning
Linkage

Based
Algorithms
Among Hierarchical
Clustering Methods
Margareta Ackerman
and
Shai
Ben

David
IJCAI 2011
Clustering is one of the most widely used tools
for exploratory data analysis.
Social Sciences
Biology
Astronomy
Computer Science
….
All apply clustering to gain a first understanding of
the structure of large data sets.
The Theory

Practice Gap
Both statements still apply today.
Bridging the Theory

Practice Gap:
Previous work
•
Axioms of clustering
[(Kleinberg, NIPS 02),
(Ackerman & Ben

David, NIPS 08), (
Meila
, NIPS
08)]
•
Clusterability
[(
Balcan
, Blum, and
Vempala
,
STOC 08
), (Ackerman & Ben

David, AISTATS 09) ]
There are a wide variety of clustering
algorithms, which often produce very
different clusterings.
How should a user decide which algorithm to
use for a given application?
M. Ackerman, S. Ben

David, and D. Loker
Bridging the Theory

Practice Gap:
Clustering algorithm selection
We propose a framework that lets a user utilize
prior
knowledge
to select an algorithm
•
Identify properties that distinguish between the
input

output behaviour
of different clustering
algorithms
•
The properties should be:
1)
Intuitive and “user

friendly”
2)
Useful for classifying clustering algorithms
Our approach for clustering algorithm selection
•
A property

based classification of partitional
clustering algorithms
(Ackerman, Ben

David, and
Loker
, NIPS ‘10)
•
A characterizing of a single

linkage with the
k

stopping criteria
(
Zadeh
and Ben

David, UAI 09)
•
A characterization of linkage

based clustering with
the
k

stopping criteria
(Ackerman, Ben

David, and
Loker
, COLT ‘10)
Previous Work in Property

Based Framework
•
Extend the above property

based
framework to the hierarchical clustering
setting
•
Propose two intuitive properties that
uniquely indentify
hierarchical linkage

based
clustering algorithms
•
Show that common hierarchical algorithms,
including bisecting
k

means, cannot be
simulated by any linkage

based algorithm
Our
contributions
Outline
•
Define Linkage

Based clustering
•
Introduce two new properties of
hierarchical clustering algorithms
•
Main result
•
Hierarchical clustering paradigms that are
not linkage

based
•
Conclusions
A set
C_i
is a
cluster
in a
dendrogram
D
if there
exists a node in the
dendrogram
so that
C_i
is
the
set of its leaf descendents.
Formal
Setup:
Dendrograms
and clusterings
Dendrogram
:
C = {C
1
, … , C
k
}
is a
clustering
in a
dendrogram
D
if
–
C
i
is a cluster in
D
for all
1≤
i
≤ k
, and
–
clusters are disjoint,
C
i
∩
C
j
=
Ø
for all
1≤
i
<j ≤k
.
Formal
Setup:
Dendrograms
and clusterings
Formal
Setup:
Hierarchical clustering algorithm
A
Hierarchical Clustering Algorithm
A
maps
Input:
A data set
X
with a distance function
d
,
denoted
(
X,d
)
to
Output:
A
dendrogram
of
X
An algorithm
A
is
Linkage

Based
if there exists a
linkage

function
l
:{(
X
1
,
X
2
,d): d
over
X
1
u
X
2
}→ R
+
such that for any
(
X,d
)
,
A(
X,d
)
can be constructed as
follows:
•
Create a single

node tree for every elements of
X
Linkage

Based Algorithm
An algorithm
A
is
Linkage

Based
if there exists a
linkage

function
l
:{(
X
1
,
X
2
,d): d
over
X
1
u
X
2
}→ R
+
such that for any
(
X,d
)
,
A(
X,d
)
can be constructed as
follows:
•
Create a single

node tree for every elements of
X
•
Repeat the following until a single tree remains:
Merge the pair of trees whose element sets are
closest according to
l
.
Linkage

Based Algorithm
Ex. Single

linkage, average

linkage,
c
omplete linkage
Outline
•
Define Linkage

Based clustering
•
Introduce two new properties of
hierarchical clustering algorithms
•
Main result
•
Hierarchical clustering paradigms that are
not linkage

based
•
Conclusions
Locality
Informal Definition
If we select a set of disjoint clusters from a
dendrogram
,
and run the algorithm on the union of these clusters, we
obtain a result that is consistent with the original
dendrogram
.
D = A(
X,d
)
D’ = A(
X’,d
)
X’={x
1
, …, x
6
}
Outer Consistency
A(
X,d
)
C
•
The outer

consistent change
makes the clustering
C
more
prominent.
•
If
A
is
outer

consistent
, then
A(
X,d
’)
will also include the
clustering
C
.
C
on dataset
(
X,d
)
C
on dataset
(
X,d
’)
Increase
pairwise
between

cluster
distances
Outline
•
Define Linkage

Based clustering
•
Introduce two new properties of
hierarchical clustering algorithms
•
Main result
•
Hierarchical clustering paradigms that are
not linkage

based
•
Conclusions
Theorem:
A hierarchical clustering function is
Linkage

Based
if and only if
it is
L
ocal
and
Outer

Consistent
.
Our Main Result
Recall direction:
If
A
satisfies Outer

Consistency and Locality, then
A
is Linkage

Based.
Goal:
Define a linkage function
l
so that the linkage

based
clustering based on
l
outputs
A(
X,d
)
(for every
X
and
d
).
Brief Sketch
of
Proof
•
Define an operator
<
A
:
(
X,Y,d
1
)
<
A
(
Z,W,d
2
)
if when we run
A
on
(X
u
Y
u
Z
u
W,d
)
,
where
d
extends
d
1
and
d
2
,
X
and
Y
are merged before
Z
and
W
.
Brief Sketch
of
Proof
A(
X,d
)
Z W X Y
•
Prove that
<
A
can be extended
to a partial ordering
by proving
that it is cycle

free
•
This
implies that there exists
an order preserving function
l
that maps pairs of data sets to
R
+
.
Outline
•
Define Linkage

Based clustering
•
Introduce two new properties of
hierarchical clustering
•
Main result
•
Hierarchical clustering paradigms that are
not linkage

based
•
Conclusions
Hierarchical but
Not
Linkage

Based
•
P

Divisive algorithms construct
dendrograms
top

down
using a partitional 2

clustering algorithm
P
to
determine how to split nodes.
•
Many natural partitional 2

clustering algorithms satisfy
the following property:
A partitional 2

clustering algorithm
P
is
C
ontext
Sensitive
if there exist
d⊂ d
’
so that
P({
x,y,z
),d) = {x, {
y,z
}}
and
P({
x,y,z,w
} ,d’)= {{
x,y
}, {
z,w
}}.
Ex. K

means, min

sum, min

diameter, and further

centroids
.
Hierarchical but
Not
Linkage

Based
•
The
input

output
behaviour
of some natural divisive
algorithms is distinct from that of all linkage

based
algorithms.
•
The bisecting
k

means algorithm, and other natural
divisive algorithms, cannot be simulated by
any
linkage

based algorithm.
Theorem:
If
P
is context

sensitive, then the
P
–
divisive algorithm fails the
locality property.
Outline
•
Define Linkage

Based clustering
•
Introduce two new properties of
hierarchical clustering algorithms
•
Main result
•
Hierarchical clustering paradigms that are
not linkage

based
•
Conclusions
Conclusions
•
We characterize hierarchical Linkage

Based
clustering in terms of two intuitive
properties.
•
Show that some natural hierarchical
algorithms have different input

output
behavior than any linkage

based algorithm.
Locality
For any clustering
C = {C
1
, … , C
k
}
in
D = A(
X,d
)
,
•
C
is also a clustering in
D’ = A(X’ =
u
C
i
, d)
•
C
i
roots the same sub

dendrogram
in both
D
and
D’
•
For all
x,y
in
X’
,
x
occurs below
y
in
D
iff
the same holds in
D’
.
D’ = A(
X’,d
)
X’={x
1
, …, x
6
}
D = A(
X,d
)
Comments 0
Log in to post a comment