Discerning Linkage-Based Algorithms

dealerdeputyAI and Robotics

Nov 25, 2013 (3 years and 9 months ago)

72 views

Discerning
Linkage
-
Based
Algorithms

Among Hierarchical
Clustering Methods

Margareta Ackerman

and

Shai

Ben
-
David


IJCAI 2011




Clustering is one of the most widely used tools
for exploratory data analysis.


Social Sciences


Biology


Astronomy


Computer Science


….


All apply clustering to gain a first understanding of
the structure of large data sets.


The Theory
-
Practice Gap

Both statements still apply today.


Bridging the Theory
-
Practice Gap:

Previous work



Axioms of clustering
[(Kleinberg, NIPS 02),
(Ackerman & Ben
-
David, NIPS 08), (
Meila
, NIPS
08)]



Clusterability

[(
Balcan
, Blum, and
Vempala
,
STOC 08
), (Ackerman & Ben
-
David, AISTATS 09) ]



There are a wide variety of clustering
algorithms, which often produce very
different clusterings.



How should a user decide which algorithm to
use for a given application?






M. Ackerman, S. Ben
-
David, and D. Loker

Bridging the Theory
-
Practice Gap:

Clustering algorithm selection

We propose a framework that lets a user utilize
prior
knowledge

to select an algorithm



Identify properties that distinguish between the
input
-
output behaviour
of different clustering
algorithms


The properties should be:

1)

Intuitive and “user
-
friendly”

2)

Useful for classifying clustering algorithms

Our approach for clustering algorithm selection




A property
-
based classification of partitional
clustering algorithms
(Ackerman, Ben
-
David, and
Loker
, NIPS ‘10)


A characterizing of a single
-
linkage with the
k
-
stopping criteria
(
Zadeh

and Ben
-
David, UAI 09)


A characterization of linkage
-
based clustering with
the
k
-
stopping criteria
(Ackerman, Ben
-
David, and
Loker
, COLT ‘10)


Previous Work in Property
-
Based Framework



Extend the above property
-
based
framework to the hierarchical clustering
setting


Propose two intuitive properties that
uniquely indentify
hierarchical linkage
-
based
clustering algorithms


Show that common hierarchical algorithms,
including bisecting
k
-
means, cannot be
simulated by any linkage
-
based algorithm

Our
contributions

Outline


Define Linkage
-
Based clustering


Introduce two new properties of
hierarchical clustering algorithms


Main result


Hierarchical clustering paradigms that are
not linkage
-
based


Conclusions




A set
C_i

is a
cluster

in a
dendrogram

D

if there
exists a node in the
dendrogram

so that
C_i

is

the
set of its leaf descendents.




Formal
Setup:

Dendrograms

and clusterings

Dendrogram
:


C = {C
1
, … , C
k
}
is a
clustering

in a
dendrogram

D

if


C
i

is a cluster in
D

for all
1≤
i

≤ k
, and


clusters are disjoint,

C
i

C
j

=

Ø
for all
1≤
i
<j ≤k
.




Formal
Setup:

Dendrograms

and clusterings

Formal
Setup:

Hierarchical clustering algorithm

A

Hierarchical Clustering Algorithm
A


maps

Input:

A data set
X

with a distance function
d
,
denoted
(
X,d
)

to

Output:

A
dendrogram

of
X

An algorithm
A

is
Linkage
-
Based

if there exists a

linkage
-
function
l
:{(
X
1
,

X
2

,d): d
over
X
1
u
X
2

}→ R
+


such that for any
(
X,d
)
,

A(
X,d
)
can be constructed as

follows:


Create a single
-
node tree for every elements of
X






Linkage
-
Based Algorithm

An algorithm
A

is
Linkage
-
Based

if there exists a

linkage
-
function
l
:{(
X
1
,

X
2

,d): d
over
X
1
u
X
2

}→ R
+


such that for any
(
X,d
)
,

A(
X,d
)
can be constructed as

follows:


Create a single
-
node tree for every elements of
X


Repeat the following until a single tree remains:


Merge the pair of trees whose element sets are
closest according to
l
.





Linkage
-
Based Algorithm

Ex. Single
-
linkage, average
-
linkage,

c
omplete linkage

Outline


Define Linkage
-
Based clustering


Introduce two new properties of
hierarchical clustering algorithms


Main result


Hierarchical clustering paradigms that are
not linkage
-
based


Conclusions



Locality

Informal Definition


If we select a set of disjoint clusters from a
dendrogram
,
and run the algorithm on the union of these clusters, we
obtain a result that is consistent with the original
dendrogram
.

D = A(
X,d
)

D’ = A(
X’,d
)

X’={x
1
, …, x
6
}

Outer Consistency

A(
X,d
)

C


The outer
-
consistent change
makes the clustering
C

more
prominent.


If
A

is
outer
-
consistent
, then
A(
X,d
’)
will also include the
clustering
C
.

C

on dataset
(
X,d
)

C

on dataset
(
X,d
’)

Increase
pairwise

between
-
cluster
distances

Outline


Define Linkage
-
Based clustering


Introduce two new properties of
hierarchical clustering algorithms


Main result


Hierarchical clustering paradigms that are
not linkage
-
based


Conclusions




Theorem:


A hierarchical clustering function is





Linkage
-
Based





if and only if



it is
L
ocal
and

Outer
-
Consistent
.


Our Main Result




Recall direction:


If
A

satisfies Outer
-
Consistency and Locality, then
A

is Linkage
-
Based.


Goal:


Define a linkage function
l

so that the linkage
-
based
clustering based on
l

outputs
A(
X,d
)



(for every
X

and
d
).


Brief Sketch
of
Proof


Define an operator
<
A

:


(
X,Y,d
1
)
<
A

(
Z,W,d
2
)

if when we run
A

on
(X
u
Y
u
Z
u
W,d
)
,
where
d

extends
d
1

and
d
2
,
X

and
Y

are merged before
Z

and
W
.



Brief Sketch
of
Proof

A(
X,d
)

Z W X Y


Prove that
<
A

can be extended
to a partial ordering
by proving
that it is cycle
-
free



This
implies that there exists
an order preserving function
l

that maps pairs of data sets to
R
+
.



Outline


Define Linkage
-
Based clustering


Introduce two new properties of
hierarchical clustering


Main result


Hierarchical clustering paradigms that are
not linkage
-
based


Conclusions



Hierarchical but
Not

Linkage
-
Based



P
-
Divisive algorithms construct
dendrograms

top
-
down
using a partitional 2
-
clustering algorithm
P
to
determine how to split nodes.


Many natural partitional 2
-
clustering algorithms satisfy
the following property:


A partitional 2
-
clustering algorithm
P

is


C
ontext

Sensitive
if there exist
d⊂ d


so that

P({
x,y,z
),d) = {x, {
y,z
}}
and
P({
x,y,z,w
} ,d’)= {{
x,y
}, {
z,w
}}.

Ex. K
-
means, min
-
sum, min
-
diameter, and further
-
centroids
.

Hierarchical but
Not

Linkage
-
Based



The
input
-
output
behaviour

of some natural divisive
algorithms is distinct from that of all linkage
-
based
algorithms.


The bisecting
k
-
means algorithm, and other natural
divisive algorithms, cannot be simulated by
any

linkage
-
based algorithm.


Theorem:



If
P

is context
-
sensitive, then the
P

divisive algorithm fails the
locality property.

Outline


Define Linkage
-
Based clustering


Introduce two new properties of
hierarchical clustering algorithms


Main result


Hierarchical clustering paradigms that are
not linkage
-
based


Conclusions



Conclusions


We characterize hierarchical Linkage
-
Based
clustering in terms of two intuitive
properties.


Show that some natural hierarchical
algorithms have different input
-
output
behavior than any linkage
-
based algorithm.

Locality


For any clustering
C = {C
1
, … , C
k
}

in
D = A(
X,d
)
,



C

is also a clustering in
D’ = A(X’ =
u

C
i

, d)


C

i

roots the same sub
-
dendrogram

in both
D

and
D’


For all
x,y

in
X’
,
x

occurs below
y

in
D

iff

the same holds in
D’
.


D’ = A(
X’,d
)

X’={x
1
, …, x
6
}

D = A(
X,d
)