Determining the number of clusters using information entropy for mixed data

boorishadamantΤεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

63 εμφανίσεις

Intelligent Database Systems Lab

Presenter : Chuang, Kai
-
Ting

Authors : Jiye Liang, Xingwang Zhao,

Deyu Li, Fuyuan Cao, Chuangyin Dang

2011, PR


Determining the number of clusters using
information entropy for mixed data

Intelligent Database Systems Lab

Outlines


Motivation


Objectives


Methodology


Experiments


Conclusions


Comments

Intelligent Database Systems Lab

Motivation


In cluster analysis, one of the most challenging and
difficult problems is the
determination of the
number of clusters

in a data set.


These algorithms are not very effective for a
mixed data
.

Intelligent Database Systems Lab

Objectives





By introducing a
new dissimilarity measure into the k
-
prototypes algorithm
, we develop an algorithm to
determine the number of clusters in a mixed data set.

Intelligent Database Systems Lab

Methodology
-
Framework

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Input

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Numerical data

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Categorical data

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Categorical data

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Mixed data

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Example

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Utility function

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Modified k
-
prototypes

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Methodology
-
Modified k
-
prototypes

Input

Apply the modified k
-
prototypes

Identify the worst
cluster and reassign

Cluster validity index

output

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments
-
Synthetic data sets

Intelligent Database Systems Lab

Experiments
-
Synthetic data sets

Intelligent Database Systems Lab

Experiments
-
Real data sets

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Conclusions


Detecting the number of clusters.


Obtaining better clustering results.

Intelligent Database Systems Lab

Comments


Advantages


The approach is helpful.


Applications


Mixed data.