# Clustering Sequential Data:

Τεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 4 χρόνια και 7 μήνες)

130 εμφανίσεις

Clustering Sequential Data:
Research Paper Review

Presented by Glynis Hawley

April 28, 2003

On the Optimal Clustering of Sequential Data

by Cheng
-
Ru Lin and Ming
-
Syan Chen,

Electrical Engineering Department

National Taiwan University, Taipei, Taiwan

Second SIAM International Conference on Data Mining

April 11
-
13, 2002

http://www.siam.org/meetings/sdm02/proceedings/sdm02
-
09.pdf

Agenda

Introduction: What is
sequential clustering?

Problem definition for
algorithm design

Optimal Algorithm: SC
OPT

Greedy Algorithm: SC
GD

Conclusion

Sequential Clustering Problem

Attributes and
sequence

of objects are
both important.

Objects within a cluster form a
continuous region.

An object within one cluster may be
closer to the centroid of a different
cluster than it is to its own centroid.

Conventional Clustering vs.
Sequential Clustering

Conventional Clustering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
X
Y
Sequential Clustering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
X
Y
Application Areas

Analysis of motion patterns of objects.

Cellular phones.

Analysis of status logs of running
machines.

Problem Definition

Partitioning problem

n

sequential objects into
k

clusters

Dissimilarity measurement

Squared Euclidean distance

Cluster quality

Cost measurement: penalizes clusters for
amount of dissimilarity of objects

Best solution minimizes the sum of the
costs of all clusters

m
i
i
E
c
o
D
Cl
Cost
1
)
,
(
)
(
2
Cost Definition

Cost of a cluster: summation over all
m
objects of the squared Euclidean distance
of the object from the cluster centroid.

Sequential Clustering Algorithms

Optimal Sequential Clustering Algorithm

SC
OPT

Greedy Sequential Clustering Algorithm

SC
GD

Algorithm SC
OPT

Determines optimal
k
-
partition of a set
of sequential objects.

Uses the property of optimal
substructure.

Systematically solves all possible sub
-
problems.

Stores results to be used in later steps.

Complexity of Algorithm SC
OPT

Time:
O
(
kn
2
)

Space:
O
(
kn
)

Initially, arbitrarily insert separators to
divide the
n
objects into
k
clusters.

1 2 3
|

4 5 6
|

7 8 9

Algorithm SC
GD

Reposition the separators by “moves” and
“jumps” to reduce the cost of the clusters.

1 2 3 4 5 6 7 8 9

1 2 3

4 5 6

7 8 9

The best possible move or jump is
determined by calculating the cost reductions
of all possible moves and jumps.

Algorithm SC
GD
(Cont.)

move

jump

move

jump

Algorithm SC
GD
(Cont.)

Continue repositioning separators until
no further cost reductions are possible.

Complexity

Time:
O

(
nl / k + n
),
linear

Space:
O

(
k
)

Quality of clusters increases with
n

and
with average cluster size.

Conclusion

Sequential clustering requires that the
sequence of data points be considered
as well as the similarity of attributes.

Algorithms:

SC
OPT

and SC
GD

SC
GD

approaches SC
OPT
in terms of quality
of clusters when average cluster sizes are
large.