cluster

savagelizardΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

73 εμφανίσεις

2013/04/13

Clustering Analysis

Outline


What is Clustering Analysis?


Partitioning Methods


K
-
Means


Hierarchical Methods


Agglomerative and Divisive hierarchical clustering


Density
-
Based Methods


DBSCAN



What is Clustering Analysis?


Cluster

analysis

or

clustering

is

the

task

of

grouping

a

set

of

objects

in

such

a

way

that

objects

in

the

same

group

(called

cluster
)

are

more

similar

to

each

other

than

to

those

in

other

groups

(clusters)
.


Ex
:

Friends

in

facebook
.



Clustering

is

useful

in

that

it

can

lead

to

the

discovery

of

previously

unknown

groups

within

the

data
.

Partitioning Methods


Given a data set, D, of n objects, and k, the number of
clusters to form, a partitioning algorithm organizes the
objects into k partitions ( k ≤ n), where each partition
represents a cluster.

K
-
Means


Input:

1.
X={x
1
,x
2
,…,x
n
}: A data set in d
-
dim space.

2.
k:Number of clusters.


Output:


Cluster centers: , 1≤ j ≤ k .




Requirement:


The output should minimize the object function.

j
c
Object function





j
i
G
x
j
i
j
c
x
e
2
||
||









k
j
k
j
G
x
j
i
j
j
i
c
x
e
E
1
1
2
||
||
;

G
j
:

j

group

,

C
j
: cluster center

; k:Number of clusters.


Goal:

要分成幾群以及相關的
cluster center,


使得

E
的值為最小。

Algorithm:
k
-
means. The k
-
means algorithm for partitioning,
where each cluster’s center
is represented by the mean value of
the objects in the cluster.

Input:

k: the number of clusters,

D: a data set containing n objects.

Output: A set of
k clusters.

Method:

(1) arbitrarily choose
k objects from D as the initial cluster centers;

(2)
repeat

(3) (re)assign each object to the cluster to which the object is the
most similar, based on the mean value of the objects in the
cluster;

(4) update the cluster means, that is, calculate the mean value of the
objects for

each cluster;

(5)
until no change;

Example


(3)assign

(
4
)update

Example


(4)update

(3)reassign

(4)update

(4)update

Hierarchical Methods


A hierarchical clustering method works by grouping
data objects into hierarchy or “tree” of clusters.



Representing data objects in form of hierarchy is
useful for data summarization and visualization.



Hierarchical methods suffer from the fact that once a
step(merge or spilt)is done,it can never be undone.

Agglomerative versus Divisive
hierarchical clustering


Agglomerative methods
starts with individual objects
as clusters,which are iteratively merged to form larger
cluster.



Divisive methods initially let all the given objects
form
one cluster
,which they iteratively split into smaller
clusters.



Agglomerative clustering

Divisive clustering

Bottom
-
up

Top
-
down

Individual objects

Placing all objects in one
cluster

Merge

Split

1

2

3

4

5

6

7

8

9

d(G
s
)=1

d(G
t
)=0.6

d(G
t
)=0.8

d
min
=0.75


Example

(Divisive hierarchical clustering)

7

8

9

1

2

6

3

5

4

Example

(Agglomerative hierarchical clustering)


Five objects: (1,2) (2.5,4.5) (2,2) (4,1.5) (4,2.5)

Example


Five objects: (1,2) (2.5,4.5) (2,2) (4,1.5) (4,2.5)


D =

Example


Dendrogram

Density
-
Based Methods


Partitioning and hierarchical methods are designed to
find spherical
-
shaped clusters.



Density
-
based clustering methods can discover clusters
of nonspherical shape.




D
ensity
-
B
ased
S
patial
C
lustering of
A
pplications with
N
oise(DBSCAN)


DBSCAN finds
core objects

that have dense
neighborhoods.It connects core objects and their
neighborhoods to form dense regions as cluster.



A object is a

core object
if the
ε
-
neighborhood of the
object contains at least MinPts objects.

1.
ε
-
neighborhood is the space within a radius
ε

centered
at core objects.

2.
MinPts: The minimum number of points required to
form a cluster.

Algorithm: DBSCAN: a density
-
based clustering algorithm.

Input:

D: a data set containing n objects ;
ε
: the radius parameter ;
MinPts: the neighborhood density threshold.

Output: A set of density
-
based clusters.

Method:

(1) mark all objects as unvisited;

(2)
do

(3)


randomly select an unvisited object
p;

(4)


mark
p as visited;

(5)


if the
ε
-
neighborhood of
p has at least MinPts objects

(6)



create a new cluster
C, and add
p to C;

(7)



let
N be the set of objects in the
-
neighborhood of
p;

(8)




for each point
p’ in N

(9)




if
p’ is unvisited

(10)




mark
p’ as visited;

(11)




if the
-
neighborhood of
p’ has at least MinPts points,






add those points to
N;

(12)



if
p’ is not yet a member of any cluster, add p’ to C;

(13)


end for

(14)


output
C;

(15)

else mark
p as noise;

(16)
until no object is unvisited;

Example


A given
ε

represented by the radius of the circles,
and let MinPts = 3.