SMART: Novel Self Splitting

Merging Clustering Algorithm
Rui
Fa
a
,
Asoke
K
Nandi
a
, b
a
The University of Liverpool, UK
b
The University of
Jyväskylä
, Finland
The project (Ref. NIHR

RP

PG

0310

1004

AN) is supported
by National Institute for Health Research (NIHR), UK.
Outline:
Introduction
SMART Algorithm
Datasets
Numerical Results
Conclusions
Introduction
Clustering
Long history or wide range of applications
Microarray: High throughput
ORFs
S1
S2
S3
S4
S5
S6
S7
S8
S9
YDL179w

0.75

0.90

0.98

0.73

0.67

0.12

0.95

1.01
0.79
YLR079w

0.48

0.70

0.47

0.65

0.45

0.47

0.71

1.02
0.24
YER111c

0.42
0.23
1.84

0.02

0.61

0.65

0.79

0.39

0.09
YBR200w
0.09
0.55

0.89

1.19

1.11

0.76
0.09
2.16
1.46
YJL194w

1.29
1.71

0.52

1.11

0.63

0.02

0.36

0.76
1.44
YLR274w

0.40
0.15

0.17

1.13

1.05

0.87

0.67
0.25
1.79
Introduction
Clustering
Issues
The number of clusters is unknown
apriori
The inconsistency from random initialization
Possible Solution
Self

splitting and merging clustering algorithm
Existing algorithms:
OPTOC [1], SSMCL
Introduction
Clustering
Issues
The number of clusters is unknown
apriori
The inconsistency from random initialization
Possible Solution
Self

splitting and merging clustering algorithm
Existing algorithms:
OPTOC [1]
,
SSMCL
However …
SMART Algorithm: Overview
S
plitting
M
erging
A
wa
R
eness
T
actics (
SMART
)
Starts with one cluster
Integrates four techniques
Splitting while merging
Terminates when the stopping
criterion meets
Otherwise, continues splitting
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview
S
plitting
M
erging
A
wa
R
eness
T
actics (
SMART
)
Starts with one cluster
Integrates four techniques
Splitting while merging
Terminates when the stopping
criterion meets
Otherwise, continues splitting
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview
S
plitting
M
erging
A
wa
R
eness
T
actics (
SMART
)
Starts with one cluster
Integrates four techniques
Splitting while merging
Terminates when the stopping
criterion meets
Otherwise, continues splitting
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview
S
plitting
M
erging
A
wa
R
eness
T
actics (
SMART
)
Starts with one cluster
Integrates four techniques
Splitting while merging
Terminates when the stopping
criterion meets
Otherwise, continues splitting
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview
S
plitting
M
erging
A
wa
R
eness
T
actics (
SMART
)
Starts with one cluster
Integrates four techniques
Splitting while merging
Terminates when the stopping
criterion meets
Otherwise, continues splitting
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm: Overview
S
plitting
M
erging
A
wa
R
eness
T
actics (
SMART
)
Starts with one cluster
Integrates four techniques
Splitting while merging
Terminates when the stopping
criterion meets
Otherwise, continues splitting
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
SMART Algorithm:
Technical Details
Technique 1:
Finds the highest local density center
Technique 2:
Combines the OPTOC paradigm with the Kaufman
Approach
Produces a new
centroid
SMART Algorithm:
Technical Details
Technique 1:
Finds the highest local density center
Technique 2:
Combines the OPTOC paradigm with the Kaufman
Approach
Produces a new
centroid
SMART Algorithm:
Technical Details
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
STEP 1:
Find new
centroid
by Kaufman
approach [1].
STEP 2:
Calculate the minimum distance
between the new
centroid
and other
centroids
.
Initialize OPTOC with new
centroid
and
the distance.
STEP 3:
Start competitive learning [2].
STEP 4:
Update the new
centroid
with the
learning result.
SMART Algorithm:
Technical Details
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
STEP 1:
Find new
centroid
by Kaufman
approach [1].
STEP 2:
Calculate the minimum distance
between the new
centroid
and other
centroids
.
Initialize OPTOC with new
centroid
and
the distance.
STEP 3:
Start competitive learning [2].
STEP 4:
Update the new
centroid
with the
learning result.
SMART Algorithm:
Technical Details
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
STEP 1:
Find new
centroid
by Kaufman
approach [1].
STEP 2:
Calculate the minimum distance
between the new
centroid
and other
centroids
.
Initialize OPTOC with new
centroid
and
the distance.
STEP 3:
Start competitive learning [2].
STEP 4:
Update the new
centroid
with the
learning result.
SMART Algorithm:
Technical Details
K
=
1
Find the highest
local density
center
Technique
1
Split one of clusters
into two
Technique
2
If meet the merging
criterion
Technique
3
Merge the two clusters
which meet the criterion
Technique
4
If meet the stopping
criterion
No
Yes
No
Yes
STEP 1:
Find new
centroid
by Kaufman
approach [1].
STEP 2:
Calculate the minimum distance
between the new
centroid
and other
centroids
.
Initialize OPTOC with new
centroid
and
the distance.
STEP 3:
Start competitive learning [2].
STEP 4:
Update the new
centroid
with the
learning result.
SMART Algorithm:
Technical Details
Technique 3:
Cohesion
, a similarity measure, is used for merging
criteria [3]
Assume that an object in each cluster follows a
multivariate normal distribution
Merging Criteria:
If the maximum of the new cohesions is
T
chs
times
larger than that of the old ones, the two clusters with
maximal cohesion should be merged.
SMART Algorithm:
Technical Details
Technique 3:
Cohesion
, a similarity measure, is used for merging
criteria [3]
Assume that an object in each cluster follows a
multivariate normal distribution
Merging Criteria:
If the maximum of the new cohesions is
T
chs
times
larger than that of the old ones, the two clusters with
maximal cohesion should be merged.
SMART Algorithm:
Technical Details
Technique 4:
To define a stopping criterion to terminate the
SMART:
N
m

the maximum number of merges.
Obtain the minimum spinning tree (MST) for
these
K
centroids
and get an array
containing (
K

1
) branches of the MST
SMART Algorithm:
Example
2
0
2
2
0
2
(a)
2
0
2
2
0
2
(b)
2
0
2
2
0
2
(c)
2
0
2
2
0
2
(d)
2
0
2
2
0
2
(e)
2
0
2
2
0
2
(f)
2
0
2
2
0
2
(g)
2
0
2
2
0
2
(h)
2
0
2
2
0
2
(i)
2
0
2
2
0
2
(j)
2
0
2
2
0
2
(k)
2
0
2
2
0
2
(l)
2
0
2
2
0
2
(m)
2
0
2
2
0
2
(n)
2
0
2
2
0
2
(o)
2
0
2
2
0
2
(p)
Datasets:
Synthetic dataset models gene expression data
with cyclic behavior
A subset of 384 periodic gene expression
profiles in Stanford Yeast dataset
A subset of 500 most periodic gene expression
profiles in
α

38 Yeast dataset
Numerical Results:
Synthetic dataset results
1
5
9
13
17
21
24
2
0
2
(a)
1
5
9
13
17
21
24
2
0
2
(b)
1
5
9
13
17
21
24
2
0
2
(c)
5
10
15
20
2
0
2
(d)
1
5
9
13
17
21
24
2
0
2
(e)
1
5
9
13
17
21
24
2
0
2
(f)
T
chs
is 5
Maximum
number of
merging is 10
Numerical Results:
Stanford Yeast data results
1
3
5
7
9
11
13
15
17
2
0
2
Samples
Gene Expression
(b) Late G1 phase, 141 Genes
1
3
5
7
9
11
13
15
17
2
0
2
Samples
Gene Expression
(e) M phase, 72 Genes
1
3
5
7
9
11
13
15
17
2
0
2
Samples
Gene Expression
(d) G2 phase, 51 Genes
1
3
5
7
9
11
13
15
17
2
0
2
Samples
Gene Expression
(a) Early G1 phase, 72 Genes
1
3
5
7
9
11
13
15
17
2
0
2
Samples
Gene Expression
(f) Q phase, 10 Genes
1
3
5
7
9
11
13
15
17
2
0
2
Samples
Gene Expression
(c) S phase, 38 Genes
T
chs
is 5
Maximum
number of
merging is 10
Numerical Results:
α

38 Yeast data results
5
10
15
20
25
5
4
3
2
1
0
1
2
3
4
Cluster 1, 199 Genes
Samples
Gene Expression (Normalizaed)
5
10
15
20
25
4
3
2
1
0
1
2
3
4
Cluster 2, 82 Genes
Samples
Gene Expression (Normalizaed)
5
10
15
20
25
4
3
2
1
0
1
2
3
4
5
Cluster 3, 92 Genes
Samples
Gene Expression (Normalizaed)
5
10
15
20
25
4
3
2
1
0
1
2
3
4
5
Cluster 4, 127 Genes
Samples
Gene Expression (Normalizaed)
T
chs
is 5
Maximum
number of
merging is 10
Conclusion:
S
plitting

M
erging
A
wa
R
eness
T
actics
(
SMART
) has been proposed.
A
framework, integrating many techniques,
starting with one cluster and employing a
splitting

while

merging process.
Built

in s
elf

awareness to split and merge
the clusters automatically in iterations.
“Parameter

free”.
References:
[1]
P. J.
Rousseeuw
L. Kaufman, Finding Groups in Data:
An Introduction to Cluster Analysis, Wiley, 1990.
[2]
Ya

Jun Zhang and
Zhi

Qiang
Liu, “Self

splitting
competitive learning: a new on

line clustering
paradigm,” IEEE Trans. Neural Networks, vol. 13, no.
2, pp. 369

380, 2002.
[3]
Cheng

Ru
Lin and Ming

Syan
Chen, “Combining
partitional
and
hier

archical
algorithms for robust
and efficient data clustering with
cohe

sion
self

merging,”
IEEE Trans. Know. and Data Eng., vol. 17, no.
2,
pp. 145

159, 2005.
Q&A
Thank you
Comments 0
Log in to post a comment