Clustering Sequential Data:

coachkentuckyΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

104 εμφανίσεις

Clustering Sequential Data:
Research Paper Review

Presented by Glynis Hawley

April 28, 2003

On the Optimal Clustering of Sequential Data


by Cheng
-
Ru Lin and Ming
-
Syan Chen,


Electrical Engineering Department


National Taiwan University, Taipei, Taiwan



Second SIAM International Conference on Data Mining



April 11
-
13, 2002

http://www.siam.org/meetings/sdm02/proceedings/sdm02
-
09.pdf


Agenda


Introduction: What is
sequential clustering?


Problem definition for
algorithm design


Optimal Algorithm: SC
OPT


Greedy Algorithm: SC
GD


Conclusion

Sequential Clustering Problem


Attributes and
sequence

of objects are
both important.


Objects within a cluster form a
continuous region.


An object within one cluster may be
closer to the centroid of a different
cluster than it is to its own centroid.

Conventional Clustering vs.
Sequential Clustering

Conventional Clustering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
X
Y
Sequential Clustering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
X
Y
Application Areas


Analysis of motion patterns of objects.


Cellular phones.


Analysis of status logs of running
machines.

Problem Definition


Partitioning problem


n

sequential objects into
k

clusters


Dissimilarity measurement


Squared Euclidean distance


Cluster quality


Cost measurement: penalizes clusters for
amount of dissimilarity of objects


Best solution minimizes the sum of the
costs of all clusters




m
i
i
E
c
o
D
Cl
Cost
1
)
,
(
)
(
2
Cost Definition


Cost of a cluster: summation over all
m
objects of the squared Euclidean distance
of the object from the cluster centroid.

Sequential Clustering Algorithms


Optimal Sequential Clustering Algorithm


SC
OPT


Greedy Sequential Clustering Algorithm


SC
GD

Algorithm SC
OPT


Determines optimal
k
-
partition of a set
of sequential objects.


Uses the property of optimal
substructure.


Systematically solves all possible sub
-
problems.


Stores results to be used in later steps.


Complexity of Algorithm SC
OPT


Time:
O
(
kn
2
)


Space:
O
(
kn
)


Initially, arbitrarily insert separators to
divide the
n
objects into
k
clusters.



1 2 3
|

4 5 6
|

7 8 9


Algorithm SC
GD


Reposition the separators by “moves” and
“jumps” to reduce the cost of the clusters.



1 2 3 4 5 6 7 8 9



1 2 3


4 5 6


7 8 9



The best possible move or jump is
determined by calculating the cost reductions
of all possible moves and jumps.

Algorithm SC
GD
(Cont.)

move

jump

move

jump

Algorithm SC
GD
(Cont.)


Continue repositioning separators until
no further cost reductions are possible.


Complexity


Time:
O

(
nl / k + n
),
linear


Space:
O

(
k
)


Quality of clusters increases with
n

and
with average cluster size.

Conclusion


Sequential clustering requires that the
sequence of data points be considered
as well as the similarity of attributes.


Algorithms:


SC
OPT

and SC
GD


SC
GD

approaches SC
OPT
in terms of quality
of clusters when average cluster sizes are
large.