SMART: Novel Self Splitting- Merging Clustering Algorithm

naivenorthΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

191 εμφανίσεις

SMART: Novel Self Splitting
-
Merging Clustering Algorithm

Rui

Fa
a
,
Asoke

K
Nandi
a
, b


a

The University of Liverpool, UK

b

The University of
Jyväskylä
, Finland

The project (Ref. NIHR
-
RP
-
PG
-
0310
-
1004
-
AN) is supported
by National Institute for Health Research (NIHR), UK.


Outline:



Introduction



SMART Algorithm



Datasets



Numerical Results



Conclusions

Introduction



Clustering



Long history or wide range of applications



Microarray: High throughput

ORFs

S1

S2

S3

S4

S5

S6

S7

S8

S9

YDL179w

-
0.75

-
0.90

-
0.98

-
0.73

-
0.67

-
0.12

-
0.95

-
1.01

0.79

YLR079w

-
0.48

-
0.70

-
0.47

-
0.65

-
0.45

-
0.47

-
0.71

-
1.02

0.24

YER111c

-
0.42

0.23

1.84

-
0.02

-
0.61

-
0.65

-
0.79

-
0.39

-
0.09

YBR200w

0.09

0.55

-
0.89

-
1.19

-
1.11

-
0.76

0.09

2.16

1.46

YJL194w

-
1.29

1.71

-
0.52

-
1.11

-
0.63

-
0.02

-
0.36

-
0.76

1.44

YLR274w

-
0.40

0.15

-
0.17

-
1.13

-
1.05

-
0.87

-
0.67

0.25

1.79

Introduction



Clustering



Issues



The number of clusters is unknown
apriori



The inconsistency from random initialization



Possible Solution



Self
-
splitting and merging clustering algorithm



Existing algorithms:
OPTOC [1], SSMCL

Introduction



Clustering



Issues



The number of clusters is unknown
apriori



The inconsistency from random initialization



Possible Solution



Self
-
splitting and merging clustering algorithm



Existing algorithms:
OPTOC [1]
,
SSMCL



However …

SMART Algorithm: Overview



S
plitting
M
erging
A
wa
R
eness

T
actics (
SMART
)



Starts with one cluster



Integrates four techniques



Splitting while merging



Terminates when the stopping
criterion meets



Otherwise, continues splitting

K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview



S
plitting
M
erging
A
wa
R
eness

T
actics (
SMART
)



Starts with one cluster



Integrates four techniques



Splitting while merging



Terminates when the stopping
criterion meets



Otherwise, continues splitting

K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview



S
plitting
M
erging
A
wa
R
eness

T
actics (
SMART
)



Starts with one cluster



Integrates four techniques



Splitting while merging



Terminates when the stopping
criterion meets



Otherwise, continues splitting

K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview



S
plitting
M
erging
A
wa
R
eness

T
actics (
SMART
)



Starts with one cluster



Integrates four techniques



Splitting while merging



Terminates when the stopping
criterion meets



Otherwise, continues splitting

K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview



S
plitting
M
erging
A
wa
R
eness

T
actics (
SMART
)



Starts with one cluster



Integrates four techniques



Splitting while merging



Terminates when the stopping
criterion meets



Otherwise, continues splitting

K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview



S
plitting
M
erging
A
wa
R
eness

T
actics (
SMART
)



Starts with one cluster



Integrates four techniques



Splitting while merging



Terminates when the stopping
criterion meets



Otherwise, continues splitting

K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm:
Technical Details




Technique 1:



Finds the highest local density center




Technique 2:



Combines the OPTOC paradigm with the Kaufman
Approach



Produces a new
centroid


SMART Algorithm:
Technical Details




Technique 1:



Finds the highest local density center




Technique 2:



Combines the OPTOC paradigm with the Kaufman
Approach



Produces a new
centroid


SMART Algorithm:
Technical Details


K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes


STEP 1:



Find new
centroid

by Kaufman
approach [1].



STEP 2:


Calculate the minimum distance
between the new
centroid

and other
centroids

.



Initialize OPTOC with new
centroid

and
the distance.



STEP 3:



Start competitive learning [2].



STEP 4:



Update the new
centroid

with the
learning result.

SMART Algorithm:
Technical Details


K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes


STEP 1:



Find new
centroid

by Kaufman
approach [1].



STEP 2:


Calculate the minimum distance
between the new
centroid

and other
centroids

.



Initialize OPTOC with new
centroid

and
the distance.



STEP 3:



Start competitive learning [2].



STEP 4:



Update the new
centroid

with the
learning result.

SMART Algorithm:
Technical Details


K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes


STEP 1:



Find new
centroid

by Kaufman
approach [1].



STEP 2:


Calculate the minimum distance
between the new
centroid

and other
centroids

.



Initialize OPTOC with new
centroid

and
the distance.



STEP 3:



Start competitive learning [2].



STEP 4:



Update the new
centroid

with the
learning result.

SMART Algorithm:
Technical Details


K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes


STEP 1:



Find new
centroid

by Kaufman
approach [1].



STEP 2:


Calculate the minimum distance
between the new
centroid

and other
centroids

.



Initialize OPTOC with new
centroid

and
the distance.



STEP 3:



Start competitive learning [2].



STEP 4:



Update the new
centroid

with the
learning result.

SMART Algorithm:
Technical Details




Technique 3:



Cohesion
, a similarity measure, is used for merging
criteria [3]






Assume that an object in each cluster follows a
multivariate normal distribution



Merging Criteria:


If the maximum of the new cohesions is
T
chs

times
larger than that of the old ones, the two clusters with
maximal cohesion should be merged.



SMART Algorithm:
Technical Details




Technique 3:



Cohesion
, a similarity measure, is used for merging
criteria [3]






Assume that an object in each cluster follows a
multivariate normal distribution



Merging Criteria:



If the maximum of the new cohesions is
T
chs

times
larger than that of the old ones, the two clusters with
maximal cohesion should be merged.



SMART Algorithm:
Technical Details




Technique 4:



To define a stopping criterion to terminate the
SMART:
N
m
--

the maximum number of merges.



Obtain the minimum spinning tree (MST) for
these
K

centroids

and get an array
containing (
K
-
1
) branches of the MST

SMART Algorithm:
Example

-2
0
2
-2
0
2
(a)
-2
0
2
-2
0
2
(b)
-2
0
2
-2
0
2
(c)
-2
0
2
-2
0
2
(d)
-2
0
2
-2
0
2
(e)
-2
0
2
-2
0
2
(f)
-2
0
2
-2
0
2
(g)
-2
0
2
-2
0
2
(h)
-2
0
2
-2
0
2
(i)
-2
0
2
-2
0
2
(j)
-2
0
2
-2
0
2
(k)
-2
0
2
-2
0
2
(l)
-2
0
2
-2
0
2
(m)
-2
0
2
-2
0
2
(n)
-2
0
2
-2
0
2
(o)
-2
0
2
-2
0
2
(p)
Datasets:



Synthetic dataset models gene expression data
with cyclic behavior



A subset of 384 periodic gene expression
profiles in Stanford Yeast dataset



A subset of 500 most periodic gene expression
profiles in
α
-
38 Yeast dataset

Numerical Results:



Synthetic dataset results

1
5
9
13
17
21
24
-2
0
2
(a)
1
5
9
13
17
21
24
-2
0
2
(b)
1
5
9
13
17
21
24
-2
0
2
(c)
5
10
15
20
-2
0
2
(d)
1
5
9
13
17
21
24
-2
0
2
(e)
1
5
9
13
17
21
24
-2
0
2
(f)


T
chs

is 5



Maximum
number of
merging is 10

Numerical Results:



Stanford Yeast data results

1
3
5
7
9
11
13
15
17
-2
0
2
Samples
Gene Expression
(b) Late G1 phase, 141 Genes
1
3
5
7
9
11
13
15
17
-2
0
2
Samples
Gene Expression
(e) M phase, 72 Genes
1
3
5
7
9
11
13
15
17
-2
0
2
Samples
Gene Expression
(d) G2 phase, 51 Genes
1
3
5
7
9
11
13
15
17
-2
0
2
Samples
Gene Expression
(a) Early G1 phase, 72 Genes
1
3
5
7
9
11
13
15
17
-2
0
2
Samples
Gene Expression
(f) Q phase, 10 Genes
1
3
5
7
9
11
13
15
17
-2
0
2
Samples
Gene Expression
(c) S phase, 38 Genes


T
chs

is 5



Maximum
number of
merging is 10

Numerical Results:



α
-
38 Yeast data results

5
10
15
20
25
-5
-4
-3
-2
-1
0
1
2
3
4
Cluster 1, 199 Genes
Samples
Gene Expression (Normalizaed)
5
10
15
20
25
-4
-3
-2
-1
0
1
2
3
4
Cluster 2, 82 Genes
Samples
Gene Expression (Normalizaed)
5
10
15
20
25
-4
-3
-2
-1
0
1
2
3
4
5
Cluster 3, 92 Genes
Samples
Gene Expression (Normalizaed)
5
10
15
20
25
-4
-3
-2
-1
0
1
2
3
4
5
Cluster 4, 127 Genes
Samples
Gene Expression (Normalizaed)


T
chs

is 5



Maximum
number of
merging is 10

Conclusion:



S
plitting
-
M
erging
A
wa
R
eness

T
actics
(
SMART
) has been proposed.



A
framework, integrating many techniques,
starting with one cluster and employing a
splitting
-
while
-
merging process.



Built
-
in s
elf
-
awareness to split and merge
the clusters automatically in iterations.



“Parameter
-
free”.

References:

[1]
P. J.
Rousseeuw

L. Kaufman, Finding Groups in Data:
An Introduction to Cluster Analysis, Wiley, 1990.

[2]
Ya
-
Jun Zhang and
Zhi
-
Qiang

Liu, “Self
-
splitting
competitive learning: a new on
-
line clustering
paradigm,” IEEE Trans. Neural Networks, vol. 13, no.
2, pp. 369
-
380, 2002.

[3]
Cheng
-
Ru

Lin and Ming
-
Syan

Chen, “Combining
partitional

and
hier
-
archical

algorithms for robust
and efficient data clustering with
cohe
-
sion

self
-
merging,”
IEEE Trans. Know. and Data Eng., vol. 17, no.
2,
pp. 145
-
159, 2005.

Q&A

Thank you