Visualized Clustering and Classification of Mixed Data via Extended Structure Adaptive Self-Organizing Map

tealackingΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

61 εμφανίσεις

Visualized Clustering and Classification of Mixed Data via

Extended Structure Adaptive Self
-
Organizing Map

-
1
-

Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self
-
Organizing Map

Chung
-
Chian Hsu
*

Zih
-
Hui Lin Kuo
-
Min Wang Wei
-
Shen Tai

National Yunlin University of Science and Technology

*
E
-
mail:
hsucc@yuntech.edu.t
w


Abstract

In data mining applications, Self
-
Organizing Map (SOM) is regarded as an effective visualized
clustering technique for preserving the topological relation in the input data. SOM’s classification
capability has not been much addressed. A direct
application of SOM to classification
yields

poor
accuracy. A variant of SOM, Structure Adaptive SOM (SASOM)
,

was recently proposed for
classifying multidimensional data by incorporating a dynamic node
-
splitting scheme. However,
SASOM cannot be appropriatel
y applied to categorical or mixed (i.e., categorical and numeric) data
due to its model and the adopted Euclidean distance function suitable for only numerical data.
Moreover, it becomes difficult to cluster data on a trained SASOM due to
its
dynamic split
ting
scheme. In this paper, we propose an extended SASOM (ESASOM), integrating the features of
Generalized SOM (GSOM) which handles mixed
-
type data, for manipulating both numeric and
categorical data in classification applications. ESASOM possesses the abi
lity of both dynamic
splitting for improving classification performance and measuring the distance between mixed data.
Experimental results demonstrate that this proposed method provides better classification and
visualized results on mixed
-
type data than
other SOM variants. Clustering procedure on a trained
ESASOM is also proposed.

Keywords
: data mining, Self
-
Organizing Map (SOM), mixed data, clustering, classification

1.

Introduction

Self
-
Organizing Map (SOM), proposed by Kohonen, is regarded as an effective

data
visualization technique in data mining applications, especially in the field of data clustering [
1, 2, 3
].
It can map high
-
dimensional data into low
-
dimensional space and preserve the topological
relationship between input data through data projectio
n [
4
]. Nevertheless, the conventional SOM
methods must predefine the map structure prior to training. When the map size is not large enough,
it may fail to appropriately reflect the topology of input data. On the contrary, too large map size
may cause simi
lar data to disperse to excess clusters.

S
tructure adaptive SOM (SASOM) [
5
] provides a feasible solution for improving the foregoing
problem of the conventional SOM methods. It adapts dynamically adjusting the SOM map structure
for
raising
classification a
ccuracy by a node
-
splitting scheme. However, neither SOM nor SASOM
can manipulate both numerical and categorical data due to the Euclidean distance used in their
models inappropriate for categorical data. Hsu [
6
] proposed a Generalized SOM (GSOM)
which
can

properly measure
the dista
nce between categorical data
by the use of

distance hierarchy.
IICM




第四期

民國九十

年十二月

-
2
-

However, when the number of data is tremendous, it may fail to preserve the topological order
caused by the predefined fixed map size. Moreover, when used for classif
ication problems, GSOM
can not achieve good performance for the same reason

of fixed map size
.

In this paper, we propose an extended SASOM (ESASOM), integrating SASOM with GSOM,
to manipulate mixed data and improve classification accuracy as well. This pap
er is structured as
follows. In section 2, several related SOM
models
are reviewed and compared. In section 3,
distance hierarchy and ESASOM are elaborated. In section 4, we present several experimental
results of mixed data. Finally, conclusions are state
d in section 5.

2.

SOM methods

To establish background knowledge related to the proposed ESASOM, several SOM
models
are
reviewed
and compared in this section.

2.1.

SOM

Due to its projection capability and topology preservation property, SOM has become a
popular to
ol in visualized clustering of multidimensional data. The SOM training algorithm consists
of two essential steps: identification of the best matching unit (BMU) to
an
input data
,

and
adjustment of BMU and its
neighborhood

[
4
]

so as
to resemble
the
input da
ta. Conventional SOM
handle
s

only numeric data since those two training steps rely on a Euclidean distance function.
When categorical values are encountered, conventional SOM methods usually resort to data
transformation

which converts a categorical value

into
a set of
binary code
s

so
that

the traditional
training algorithm can be applied. However, the approach
suffers
a serious problem: the similarity
of ontology meaning between categorical data cannot be appropriately represented through
measuring the dis
tance of
the
binary codes. For example, Pepsi is intuitively more similar to Coke
than to

coffee
. Nevertheless, they possess the same similarity degree in accordance with the
computation of geometric distance
based on the Euclidean distance function after
the transformation
[
6
]
.

Another problem is the difficulty to appropriately predefine a fixed map size. An inappropriate
size will lead to poor results in clustering and classification applications.

2.2.

Growing
SOM

models

I
ncremental growing SOM
models
were pro
posed to conquer the constraint of fixed network
structure. Generally speaking, the types of growing SOM can be roughly divided into single layer
and multiple layers. The former insert
s

new neurons in between old one
s

on the s
a
me map. The later
applie
s

a h
ierarchical structure of multiple layers where each layer consists of a number of neurons
or
an
SOM.

Single layer growing SOM
models
, such as Growing Grid [
7
]
, Incremental Growing Grid [
8
]
,
Growing SOM [
9
]

and Growing Cell Structure [
10
]

can grow on a fixe
d map and insert neuron
s

according to
different
schemes
. Therefore, they were regarded as a feasible solution for providing
more flexible network structure
via the
inserti
on
. Nevertheless,
they cannot directly
reflect
the
hierarchical relationship of data
which might be inherent in mass
ive

data.

The hierarchical relationship between data can easily be
p
reserved in the hierarchi
cal structure
Visualized Clustering and Classification of Mixed Data via

Extended Structure Adaptive Self
-
Organizing Map

-
3
-

of multilayer

growing SOMs. They possess the ability
of dimension reduction
that
a
traditional
SOM owns, i.e., project
ing high
-
dimensional to low
-
dimensional space
. A
dditionally
,

they are able
to better handle massive data due to their multilayer structure. For instance, TreeGCS [
11
]
, one of
popular Growing Hierarchical SOMs (GHSOMs),
applies a dendroid structure to maint
ain the
cluster
ed

data in each node.

In practical applications, GHSOM has been applied in legal documents
and newspaper
data
set

[
12
]
.

2.3.

Structure Adaptive SOM

Conventional SOM
models
are usually applied
to
clustering problems in which class attribute
does n
ot participate in the clustering process. Since the

fixed

network structure of SOM and data
with different class labels possibly assigned to the same cluster, SOM
does
not perform well when
used for classification problems. To
address
the problem, Structur
e Adaptive SOM (SASOM
)
[
5, 13
]

was proposed to improve
classification
capability
by increasing

class consistence
in
each node via a
dynamic node
-
splitting scheme. However, like other conventional SOM methods, SASOM
process
es

merely numeric data
.

T
ransforma
tion is needed
for
categorical data

and t
he same problem
of failing to reflect the similarity of categorical values
also
exists
.

2.4.

Generalized SOM

Like encountered in training a

conventional

SOM, measuring the distance between
categorical
data is regarded
as

a
non
-
trivial

problem.
Various schemes
have
been
proposed
in the literature
,
including
binary encoding, simple matching
, Jaccard

s coefficient [
14
]

and an entropy
-
based
measure [
15
]
. Unfortunately,
these schemes do not take into consideration different ex
tent of
similarity embedded between categorical values, such as Coke is more sim
ilar to Pepsi than
to
c
offee. In [6
] we proposed
a distance representation scheme,
distance hierarchy, which
tackles
this
issue.

Generalized SOM (GSOM) [
6
]

was
proposed
for pro
jecting mixed, categorical and numeric,
data. To better reflect the topology of mixed
-
type data on a trained map, GSOM processes
categorical

data by the use of distance hierarchy

which considers the similarity embedded in
categorical values
. However, GSOM
is of fixed map size. In addition, GSOM does not treat
classification

attribute separately and is hence not suitable for classification problems.

3.

Extended Structure Adaptive SOM

A
n
e
xtended Structure Adaptive SOM (ESASOM) is proposed to not only improve
cl
assification accuracy but also
have the capability of directly
process
ing

mixed data.

3.1.

Distance hierarchy
for
categorical data

Distance hierarchy was proposed in [
6
] for representing the relationship between categorical
values and measuring the distance bet
ween them.
A distance hierarchy, composed of concept nodes,
links and link weights, represents the ontological relationship between concepts. In this
data
structure, the upper
-
level

nodes represent more general concepts

while

the lower
-
level

nodes
represen
t more specific concepts. For example, Coke and Pepsi

representing by the leaf nodes

belong to carbonated drinks

representing by a parent node of the two
, as shown in Fig. 1. Juice,
IICM




第四期

民國九十

年十二月

-
4
-

coffee and carbonated drinks all belong to

the root node

“Any”.

To illustr
ate
the difference between distance hierarchy and other popular
distance
schemes
, the
distances between Coke, Pepsi and Mocca are measured through distance hierarchy, simple
matching and binary encoding, as shown in Table 1. In distance hierarchy (c.f., Fi
gure 1), the
weight of each link is assumed to be a constant, say 1, to represent the distance between a node and
its parent node. Neither simple matching nor binary encoding can distinguish the difference
between those three drinks. In other words, the th
ree drinks have the same distance/similarity
according to the foregoing two methods.
In
contrast, via the distance hierarchy Coke is measured to
be more similar to Pepsi than to Mocca. In fact, distance hierarchy is a general schem
e in which
both the
simpl
e matching

and
binary encoding

schemes
can be modeled as special cases.
Even the
subtraction scheme for numeric values can be modeled by a degenerated distance hierarchy with
two nodes and a link weighting by the difference between the maximum and the mini
mum value

[
6
]
.


Any

Juice

Coffee

Carbonated drinks

Orange

A
pple

Latte

Mocca

Coke

Pepsi

M

X



Mi n

Ma x

X

M



(a)



(b)

Figure 1
.

D
istance

hierarch
ies for (a) a categorical and (b) a numeric attribute
.

Table 1
.

Distance comparison
between different methods
.


method

value

Distance
hierarchy

Simple
matching

Binary
encoding

Coke

Mocca

4

1

1.414

Coke

Pepsi

2

1

1.414

Mocca

Pepsi

4

1

1.414


A point can be at any position of a distance hierarchy
. A point, say X,

is denoted by an

anchor
(a leaf node)
N
X

and its positive offset as
X
=(
N
X
,
d
X
) where

d
X

represents
the distance from the root
to
X
. The distance between point
s

X

and
Y

can be calculated as follow
s
.



)
,
(
2
)
,
(
Y
X
LCP
Y
X
d
d
d
Y
X





(1)


where
d
X

and
d
Y

are the distance
s

from the roo
t to
X

and
Y
, respectively.
d
LCP(X,Y)

is the distance
from the root to the
least common point (LCP)

which is defined as one of the three cases: 1) either
X or Y if
they are in the same position;
2)
Y
if

Y

is an ancestor of
X
; otherwise, 3)

LCA(X, Y)
.
M

X

Visualized Clustering and Classification of Mixed Data via

Extended Structure Adaptive Self
-
Organizing Map

-
5
-

LCA(X
, Y) is the least common ancestor of X and Y
, the deepest node which is an ancestor of X
and Y.

For the example of Fig. 1, assume X=(Coke, 2.0) and
M
=(Mocca, 1.7). The distance

(X,
M
)
= 2.0 + 1.7


2*0 = 3.7. LCP(X,
M
)=Any.

3.2.

Distance between
data pattern and

a neuron

T
he distance between a training pattern and a
map
neuron

is measured by ma
p
ping them to
distance hierarchies and calculating their distance in the
hierarchies
. S
pecifically
, components of a
pattern and a neuron are mapped to their associated hier
archies and the distances of the
corresponding mapping points in individual hierarchies

are aggregated as the total distance.

Su
p
pose

x
,
m
,

and
dh

represent
a training
pattern,
a map
neuron
,
and a set of distance hierarchies,

respec
tively. Then the distanc
e between
x

and
m

is d
e
fined as



2
/
1
,
1
2
2
/
1
,
1
2
|
|
)
(
)
(
)
,
(
























n
i
i
i
n
i
i
i
i
i
M
X
m
dh
x
dh
m
x
d

(
2
)


where X
i

and M
i

are the mapping points of
x
i

and
m
i

respectively in
dh
i
,
and
n

is the number of
attributes.

We use an example to illustrate the process. Assume that a two
-
dimensional pattern
x
=
(Coke, 7)
with Dom(
x
2
)=[0, 10], and di
s
tance hierarchies
dh
1

and
dh
2

are given as shown in Fig. 1(a) and Fig.
1(b).
x
1
=Coke is mapped to
X
=(Coke, 2) in
dh
1
.
x
2
=7 is mapped to
X
=(MAX, 7) in
dh
2
.

For a
n

ESA
SOM associated with the training dataset, each compo
nent of a map
neuron

is

assoc
i
ated with the same distance hierarchy of its corresponding attribute of the data. Sim
i
larly,
each component of a neuron can be mapped to a point in the hierarchy.

For example, suppose

a map
neuron

m
=[(Mocca, 1.7), (MAX, 3)], a
nd their hierarchies
dh
1

and
dh
2

are as shown in Fig. 1(a) and Fig. 1(b).
m
1
=(Mocca, 1.7) is mapped to the point
M
=(Mocca,
1.7) in
dh
1
.
m
2
=(MAX, 3) is mapped to the point
M
=(MAX, 3) in
dh
2
.

The distance between
x

and
m

is measured by aggregating the diffe
rences between the
corresponding mapping points of
x

and
m

in the hierarchies. That is, |(Coke, 2)
-
(Mocca, 1.7)|=3.7,
|(MAX, 7)
-
(MAX, 3)|=4, and then
d
(
x
,
m
) = (3.7**2+4**2)**1/2=5.45.

In data preprocessing, max
-
min normalization can be performed on each a
ttribute so as to
avoid bias due to various domain ranges of the attributes.

3.3.

Process of ESASOM

We first briefly introduce the training of a traditional SOM and then propose a procedure for
training an extended SASOM.

Figure 2 depicts the projection of the
training data to an SOM. First, the data are iteratively
drawn to train the map by identifying its
best matching unit
and adjusting its neighborhood. At the
final, each data is projected onto a trained map by being assigned to its
best matching unit
. The
t
raining algorithm is outlined in Figure 3 and described as follows. Step 1 i
nitialize
s SOM by
assigning random small values to neurons. For an input pattern
x
, step 2.1 identifies the neuron,
referred to as Best Matching Unit or BMU, which has the minimum
distance to
x
. Step 2.2 adjusts
IICM




第四期

民國九十

年十二月

-
6
-

the weights of BMU and its
neighbor

neurons such that the adjusted neurons become more
similar

to
x
. The adjustment is
controlled

by learning rate

α

and
neighborhood

function
h
. The formulas are
shown in Eq. (3) and (4) where
v
,
w
,
t

and
M

represent BMU, weights of a neuron, training step and
the number of map neurons, respectively. The process is repeated till stop criterion met. A popular
criterio
n

is to pre
-
define a number for the training steps.



}
,...,
1
{
||,
)
(
)
(
||
min
arg
M
i
t
w
t
x
v
i
i




(3)


)]
(
)
(
[
)
(
)
(
)
(
)
1
(
t
w
t
x
t
h
t
t
w
t
w
i
vi
i
i








(4)



x1=[Coke, 7]

x2=[Pepsi, 2]


Training data

x1=[Coke, 7]

m
v
=[(Coke, 1.1), (MAX, 8)]


Final
projection


Figure 2. Training an SOM and projecting the data on
to

the trained map


1. Initialize SOM

2. For each input data

2.1 Identify its Best Matching Unit (BMU)

2.2 Adjust BMU and its neighbourhood

3. Repeat Step 2 till stop criterion met


Figure 3. SOM tra
ining algorithm

3.3.1.

Data training

T
he process of
training an E
SASOM (shown in Fig.
4
)

can be divided to data training stage
and dynamic node
-
splitting stage
,
elaborated as follows.

In

the data training stage,
the
GSOM

training algorithm

is applied to
project
t
he
multi
-
dimensional
mixed data

onto a two
-
dimensional map
. The initial map size is set to 4*4 and
random weights are initially assigned to the nodes

(or neurons)
. During training, nodes which do
not satisfy any of
the
stop conditions are identified for sp
litting. The stop conditions are (i) the
number of data in the node is less than
a user
-
predefined threshold (say,
2% of the total number of
the data
)
, (ii) the class consistence of a node has reached
a
defined

threshold, and (iii) the variance
of data in
the node is less than
a
defined

threshold. Class consistence and variance of a node are
defined as follows.



Total
c
c
n
n
e
Consistenc
Class
j
j
max



(
5
)


Visualized Clustering and Classification of Mixed Data via

Extended Structure Adaptive Self
-
Organizing Map

-
7
-






Total
n
i
i
Total
cv
x
n
iance
Var
1
2
)
(
1

(
6
)


where

n
cj

is the number of data
belong
ing

to class
C
j
,
x
i

is

an input data assigned to

the node
,
cv

is
the weight
vector
of the node and
n
Total

is the number of total data in
the

node.

3.3.2.

Dynamic node
-
splitting

The node which needs splitting
is
expanded to

a

2*2
sub
-
map

(or
four
child nodes), as shown
in Fig.
5.

The sub
-
map

is then iteratively

trained like
a regular
GSOM. Before
the
training, the
initial weight of each child node is assigned to the mean of
its
parent node and
neighbor

nodes.
Specifically, for mixed
-
type training data, the initial weights for the parts of numeric and categorical

attributes are
calculated
separately as follows.

The i
nitial weight
of a
numeric
child
-
node
attribute

is assigned by the following

formula
.


















c
n
k
k
p
c
c
w
w
n
w
1
2
2
1

(
7
)


where
w
c

represents
the initial weight,
w
p

is the weight of its
corresponding
parent
-
node

attribute
,
w
k

represents
the weight
of the corresponding attribute of a neighbor node,
and
n
c

is
the
number of
neighbo
r

nodes, respectively.

For
the
instance of Fig.
5
, the involving nodes for the weight
calculation of child
c
0

include
two
neighbor n
ode
p
0
,
p
1
, and
parent node
p
4
.

Regarding
a
categorical attribute, the mean

taken as the initial value

is represented by the
centroid of
the points

projected by the

parent node and neighbo
r

nodes onto its distance hierarchy
(
as
shown
in

Fig.
6
). In other w
ords,
the
centroid

p
c

is the point that gives the shortest
total
distance
from
the centroid
to the parent
-
node point and each neighbor
-
node point. That is,





i
i
k
p
c
p
p
p
k
2
)
,
(
min

arg


(
8
)


where
p
k

is a point in the hierarchy, and
p
i

represents a mapping
poi
nt

of the
parent
node
or

neighbor

nodes

(e.g.,
P
0
,
P
1
,
or
P
4

in Fig.
6
)
.


IICM




第四期

民國九十

年十二月

-
8
-


Yes

Input data

GSOM
training
algorithm
is applied

I
nitialized map size as 4*4

Identify
nodes which need to be
split

Split
the nodes to 2*2 sub
-
map
s

No

Stop
condition

satisfied?

R
emove nodes with no
projected
data

Visualize
the
t
raining

result
s

Train the
split
nodes
like
GSOM

Yes


Figure
4.

T
raining process of ESASOM.


P
1

P
0

P
2

P
4

P
3

P
1

P
0

P
2

P
4

P
3

C
0

C
1

C
2

C
3


Figure
5. Node P
4

is split to four child nodes
.


p
1

p
4

p
0

Any

Juice

Coffee

Carbonated drinks

Orange

A
pple

Latte

Mocca

Coke

Pepsi

p
c


Figure
6.

T
he centroid
p
c

represent
s

the
mean

of

p
0
,
p
1

and

p
4
.

Visualized Clustering and Classification of Mixed Data via

Extended Structure Adaptive Self
-
Organizing Map

-
9
-

The identification of the centroid can be restricted to the area in between the involving points.

For example,
any
point

on

the bold lines

in Fig.
6

is a

candidate

for
the centroid.
Specifically, we
dete
rmine

the local centroid of each of the involved bold lines and then identify the global centroid
from the local centroids. The local centroid of a
link, the point on the link giving the minimum total
distance to each
involving

points,
can be determined as

follows.



d
s
p
p
L C P
n
p
p
p
L C P
p
n
p
p
p
L C P
p
p
n
n
d
d
d
ed
i
l
r
d
l
r
i
l
r
i
s
l
r
i
l
r
i
l
r








)
,
(
)
,
(
)
,
(
,
,
,
,
,
,
2

(
9
)


where
l
r
p
ed
,

is the estimated distance from the root to
p
r,l
,
p
r,l

is the

local

centroid of

the

l
th

link
in
the

r
th

branch,
n
s

and
n
d

are the number of
p
r,l

and
p
i

in the same a
nd different link, respectively.

Since
l
r
p
ed
,
is the distance of local centroid on the
l
-
th
link

to the root, it is expected to be in the
range of
l
-
1 to
l
. The value shall be set to the end values of the
link
in case that the calculated re
sult
is out of that range. In other words, the offset of local
centroid
l
r
p
d
,
is
defined
as follows.













otherwise
,

1

if
,

1


if
,

,
,
,
,
l
r
l
r
l
r
l
r
p
p
p
p
ed
l
ed
l
l
ed
l
d

(
10
)


Then, the centroid of a

child node is acquired from the local centroids of the

involv
ed

link
s

as
follows.


}
)
,
(
{
min
arg
0
2
s
local
,




p
l
r
n
i
r,l
i
centroid
p
c
p
p
p


(
11
)


3.4.

Clustering data via
the
trained map

To
cluster

data via
a
trained map
, we propose a mixed bottom
-
up and top
-
down hierarchical
approach. The hierarchical clustering process and results can be best depicted by a dendrogram, as
shown in Fig.
7.

The bottom
-
up hierarchical clustering is applied to the trained map in which node
splitting has not been performed. Each
node
with projected data is initially treated as a single cluster

since the data are more similar to one another than
those projected in other nodes
.
The weight
vectors of map nodes are used as the values of these initial clusters.
Then, the standard bottom
-
up
hierarchical clustering [
16
]

is performed iteratively to merge clusters till one cluster

is formed
.

We
adopt

the

single link
scheme
to measure the distance between two clusters in this research.
Other methods such as complete link and average link are possible alternatives [
16
]
.
Single link
takes the minimum distance of two points belonging to the two clusters

as the

distance of the two
clusters
. The formula

is defined by Eq. (
12
), where
c
i
,

c
j

are two clusters
,

and

p
i
,
and
p
j

represent
two data points
, which are two
node
vectors in this
research
context
, in

c
i

and

c
j

respecitvely
.


IICM




第四期

民國九十

年十二月

-
10
-


j
j
i
i
j
i
j
i
c
p
c
p
p
p
d
c
c
d




,

where
)
,
(
min
)
,
(

(
12
)

To
ac
quire

more clusters, a top
-
down
, divisive
approach is taken. We treat a node splitting
occurred

during the training of an ESASOM as a cluster splitting. Therefore, a node splitting
naturally corresponds to creating a new layer in the dendrogram.

As
shown

in Fig.
7
, before node splitting, there are seven clusters (
nodes
)

each of

which
represents the
data

projected onto the node
. The bottom
-
up clustering allows users to obtain fewer
clusters. After the node splitting phase of training, the node labeled by 3
was divided to four
sub
-
nodes
in
which two have projected data. Equivalently, Cluster 3 was divided to two sub
-
clusters,
as shown in the dendrogram.


1
2
4
6
7
3
5
1
2
6
9
4
7
3
5
Two clusters
Three clusters
Two clusters
Three clusters
5.1
5.2
5.3
3.2
3.1

Figure
7.

Mixed bottom
-
up and top
-
down
hierarchical clustering
via a
trained map
.


4.

Experiments

We develo
ped a prototype using C++. Two experimental results of a synthetic and a real mixed
dataset are presented to show the comparison of ESASOM to other
SOM
models.

4.1.

Parameter setting

The initial map size
of ESASOM
was
4*4
.

T
he learning rate
was
a linear functio
n

(
t
) =


(0)
* (1.0
-

t
/
T

) with the initial value

(0) = 0.9
.

A Gaussian function
was used as

the neighborhood
function

with radius

r
(
t
) = 1.0 + (
r
(0)


1) * (1.0
-

t
/
T
)
and
initial value
r
(0)
set to
the length of the
map
. T
he class consistence and varia
nce thresholds
were set to

0.95 and 0.9, respectively.

4.2.

Synthetic mixed dataset

The synthetic mixed dataset consists of nine classes
with
two categorical attributes
(Department and Drink) and one numerical attribute (Amount), as shown in Table 2. The Amount

values were randomly generated according to normal distribution

with the
specified

mean and
deviation
. Each class possesses
certain
characteristics. Two distance hierarchies are built to
represent the ontological relationship between categorical data as s
hown in Fig.
8
. Each link weight
is
assumed to be 1.

Training results
of 1,000 iterations
of SASOM and ESASOM are shown in Fig.
9
.
Fig.
9
(
a
)

shows

that
the data belonging to the same class
in Table 2
are

projected

into the same
node
, which
effectively impr
oves the drawback
occurred

the
conventional
SOM
s

that
a
node
may contain
data
from
different classes.
The size of
a node
indicates the splitting level of the
neuron. For instance,
the
node

labeled by 2 is in

one level lower than that labe
led by 1. A proble
m shown in Fig.
9
(a) is
Visualized Clustering and Classification of Mixed Data via

Extended Structure Adaptive Self
-
Organizing Map

-
11
-

that similar classes are not projected nearby, such as class 7, 8 and 9, since SASOM was trained
with the categorical values being transformed to 0
-
1 binary codes which do not consider the
similarity embedded in the categorical valu
es.

Fig.
9
(b) is the results by the proposed ESASOM method. Like in
the
SASOM, the data in the
same class were projected to the same
node
. Two main differences to Fig.
9
(a) can be identified.
First, similar classes were projected nearby, e.g., class 7, 8,
and 9 in the upper middle of the map,
which better reflects the structure of the data in Table 2. Second, we improved the visualization of
ESASOM
so as to
help users gain more insights of the training results. Specifically, the size of the
dark dots repres
ents the size of the data projected to the
node
. The same color
in
a dark dot implies
a uniform class of the projected data while a multi
-
colored dark dot indicates multi
-
class in the
node
(see Fig.
1
2
(
d
)). The size of shadow surrounding a dark dot indicat
es the splitting level of the neural,
like that in
the
SASOM.

Table 2
.

A synthetic mixed dataset
.

Class

Dept.

Drink

Amount

(
μ
,
σ
)

Data

Count

Characteristics

1

MIS

Coke

(500, 25)

60

Management with carbonated

drinks

2

MBA

Pepsi

(400, 20)

30

3

MBA

Pepsi

(300, 15)

30

4

EE

Latte

(500, 25)

60

Engineering with coffee

5

CE

Mocca

(400, 20)

30

6

CE

Mocca

(300, 15)

30

7

SD

App
le

(500, 25)

60

Design with juice

8

VC

Orange

(400, 20)

30

9

VC

Orange

(300, 15)

30



Any

Juice

Coffee

Carbonated
drinks

Orange

Apple

Latte

Mocca

Coke

Pepsi

Any

Engineering

Management

Design

EE

CE

MIS

MBA

VC

SD


Figure
8
.
D
istance hierarchies for categorical attributes.



7

4

1


8


9



5

6


2


3




(a) SASOM


(b) ESASOM

Figure
9
. Visualized tra
ining results of synthetic data.


IICM




第四期

民國九十

年十二月

-
12
-

4.3.

Real mixed dataset

The real mixed dataset Adult from the UCI repository [
17
]

was used, which has 14 feature
attributes and one class attribute Salary. An attribute selection technique based on information gain
[
18
]

was app
lied to
determine
the correlation between feature attributes and
the
class attribute.
Seven
relevant
feature attributes were chosen including three categorical attributes (relationship,
marital
-
status and education) and four numerical attributes (capital
-
g
ain, capital
-
loss, age and
hours
-
per
-
week). We randomly sampled 10,000 tuples and divided it to two sets of 6,666 and 3,334
tuples
for
training and testing. Three distance hierarchies
were
built to represent the ontological
relationship of categorical data
, as shown in Fig.
10
.
Each link weight was set to 1.

The initial map
size is 4*4 for SASOM and ESASOM and 15*15 for SOM and GSOM, the number of training
iteration is 20,000.

As the results shown in Fig.
11
, the stability of classification accuracy for
E
SA
SOM is better
than
other
SOM
s

according to the results obtained with various

training
iterations.

In addition
, as
shown in Fig.
1
2

SASOM

does not

reflect the cluster

structure
of
the data since the neuron size does
not
reflect
the
number of
data in the neu
ron. In contrast, ESASOM offers more information on
cluster structure
.

Furthermore, the number and distribution ratio of each class in a neuron can be
identified through
the
pie chart on the map or via a pop window, which help users to acquire more
detaile
d information within a
node
(
as
shown
in

Fig.
1
2
(
d
)
).


A
ny

Own
-
child

Wife

Not
-
in
-
family

Unmarried

Other
-
relative

Husband

(a) Relationship


Divorced


A n y

S
i n g l e

C
o u p l e

Never
-
married

Separated

W
idowed

M
arried
-
spouse
-
absent

M
arried
-
AF
-
spouse

M
arried
-
civ
-
spouse

( b ) M a r i t a l
-
s t a t u s



A
ny

L i t t l e

J u n i o r

H i g h S c h o o l

C o l l e g e

A d v a n c e d

1
st
-
4
th


5
th
-
6
th


9
th


7
th
-
8
th


10
th


11
th


12
th


HS
-
grad

Some
-
c
ollege

Bachelors

Assoc
-
voc

Assoc
-
acdm

Master
s

Doctorate

Prof
-
school

Preschool

(
c
)
E d u c a t i o n


Figure
10
.
D
istance hierarchies for
the selected
categorical attributes of Adult dataset.


70

72

74

76

78

80

82

84

10,000

20,000

30,000

40,000

50,000

Accuracy
(
%)

Number of training iteration

SOM

GSOM

E
SASOM

SASOM


Figure
11
. Classificatio
n accuracy
with
various training iterations.

Visualized Clustering and Classification of Mixed Data via

Extended Structure Adaptive Self
-
Organizing Map

-
13
-







(a) SOM


(b) GSOM



(c) SASOM


(d) ESASOM

Figure 1
2
. Visualized training results of Adult dataset.


4.3.1.

Analysis of vari
ous

map sizes

The
various
map sizes from 16 to 400 were
used
to

analyz
e

the
impact on classification

accuracy. As shown in Fig.
1
3
, the distribution of accuracy lies mainly in between 80% and 82%
and most of
the
times around 81%, indicating that classification accuracy is relatively stable by our
method regardless o
f the initial map size. In other words, a small initial map size does not degrade
classification accuracy. This is important since a small initial map size makes the hierarchical
clustering easier in the later stage of clusters analysis


78

79

80

81

82

16

36

64

100

121

144

196

225

256

289

324

361

400

Map size

Accuracy
(
%)



Figure 1
3
. Classification accuracy of ESASOM with variant map sizes.

4.4.

Analysis of visualized clustering and classification based on trained

ESASOM

We applied
the
mixed bottom
-
up and top
-
down hierarchical clustering to cluster the trained
ESASOM an
d then compared classification accuracy
with respect to various

clustering results.

We first cluster the trained ESASOM by using the mixed bottom
-
up and top
-
down hierarchical
clustering mentioned in Section 3.
4
. The clustering results
is

represented by a
dendrogram

as shown
in Fig.

1
4
(c)
-
(d) constructed
in accordance with
two trained maps (Fig.
1
4
(a) and (b))

obtained prior
to and after node splitting. The right
-
lower number of a node in Fig. 14(a) represents the node
number while the left
-
upper number ind
icates the number of projected data.
An advantage of using
a dendrogram is easily to get the clusters of an intended number, For example, the dotted line in Fig.
1
4
(c) represents dividing the data to three clusters.

Table
3

shows
salary
distribution of
fo
ur clustering results
. For instance, if we group the
trained map to 3 clusters,
the
distribution in cluster 1
is
62.75%
of <=50K and 37.25% of >50K.
Experimental results indicate that larger number of clusters produces better performance in terms of
IICM




第四期

民國九十

年十二月

-
14
-

separa
ting salary classes. This is supported by t
he total entropy
calculated on the salary attribute
reduced
from 0.6357 to 0.6157
when the cluster number increases from 3 to 15
.



1
5
6
7
8
9
10
11
12
13
14
15
1428
1218
16
620
26
361
335
873
16
60
557
91
95
210
760
2
3
4
7
8
9
10
11
13
14
15
12
2
6
5
3
4
1
7
7
7
14
14
14
1
1
2
2
6
11
11
11
15
15

(a)




(b)



(c)


(d)

Figur
e 1
4
. Dendrogram constructed from a trained ESASOM. (a) The training results with an

initial
size of 16 nodes. (b) The training

results after node

splitting. (c) The upper part of the dendrogram
obtained by applying bottom
-
up hierarchical clustering of (a)
. (d) The lower part of the dendrogram
obtained according to the
top
-
down node

splitting
from (a) to
(b)


Table 3.
Distribution

of <=50k and >50k
in each cluster
and
total
entropy

(
* percentage
)

C No

class

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C11

C12

C13

C14

C15

ALL

Total
Entropy

<=50K

62.75*

47.29

93.39

-

-

-

-

-

-

-

-

-

-

-

-

75.73

0.6357

>50K

37.25

52.71

6.61

-

-

-

-

-

-

-

-

-

-

-

-

24.27

<=50K

62.75

47.06

87.50

95.48

61.54

92.55

93.8

-

-

-

-

-

-

-

-

75.73

0.6345

>50K

37.25

52.94

12.50

4.52

38.46

7.45

6.2
0

-

-

-

-

-

-

-

-

24.27

<=50K

62.75

44.50

87.50

95.48

61.54

55.68

93.96

89.10

97.8

91.15

94.87

-

-

-

-

75.73

0.6305

>50K

37.25

55.50

12.50

4.52

38.46

44.32

6.04

10.90

2.20

8.85

5.13

-

-

-

-

24.27

<=50K

62.75

44.50

87.50

95.48

61.54

55.68

80.60

99.08

8
7.50

80.00

90.13

97.80

88.42

92.38

94.87

75.73

0.6157

>50K

37.25

55.50

12.50

4.52

38.46

44.32

19.40

0.92

12.50

20.00

9.87

2.20

11.58

7.62

5.13

24.27


4.4.1.

Analysis of classification accurac
y

We further compared classification accuracy on
various

clustering
results obtained
according
to
the
resultant dendrogram of Fig. 14(c)
-
(d)

and GSOM. The classification is performed according
to majority vote of
the data in
the cluster

to which

the
unknown
input data is projected on the
trained
map
. For the example of div
iding the data to three clusters as shown in Fig.
14
(c), if an input
data is projected to node 6 which belongs to cluster 2, the input will be classified to the class of
>50K since the majority of cluster 2 is >50K
, as show
n

in row 3 of Table 3
.

Table
4

sh
ows classification accuracy on the test data.

ESASOM obtains better accuracy
(80.92%)
than

GSOM and any results
from
the hierarchical clustering

of ESASOM
. This is
expected because
the class consistence of a node in the final

ESASOM is higher

after node sp
litting.

In contrast, a cluster formed by

merging several
sub
-
clusters usually increases its diversity of
classes, resulting in reduced classification accuracy. We can

see in the table the accuracy dropped
from 76.42% to 76.06% after merging the clusters f
rom 15 to 3.


Visualized Clustering and Classification of Mixed Data via

Extended Structure Adaptive Self
-
Organizing Map

-
15
-


Table 4
Classification accuracy on test data

Method

Correct

(data count)

In
correct

(data count)

Accuracy

(%)

Hierarchical
-
3G

2536

798

76.06

Hierarchical
-
4G

2536

798

76.06

Hierarchical
-
5G

2536

798

76.06

Hierarchical
-
6G

2540

794

76.18

Hie
rarchical
-
7G

2504

830

75.10

Hierarchical
-
8G

2548

786

76.42

Hierarchical
-
9G

2548

786

76.42

Hierarchical
-
10G

2548

786

76.42

Hierarchical
-
11G

2548

786

76.42

Hierarchical
-
12G

2548

786

76.42

Hierarchical
-
13G

2548

786

76.42

Hierarchical
-
14G

2548

786

76.42

Hierarchical
-
15G

2548

786

76.42

GSOM

2548

786

76.42

ESASOM

2698

636

80.92


5.

Conclusions

An extended SASOM integrates the ideas from SASOM and GSOM. The extended model
takes the advantages of and avoids the disadvantages of the conventional models. In p
articular, we
improve the conventional models in four aspects. First, ESASOM can directly cluster and classify
mixed categorical and numeric data. The different extent of similarity between categorical values is
considered
in our model
to better reflect
th
e natural structure in
the
data

to
the trained map
. Second,
the model improves classification accuracy due to the proper handling of categorical data. Third, we
improve the visualization of a trained ESASOM, which helps users to get more insights about the

projected data. Specifically, the size of data projected in a neuron and the level to which a neuron
has been split can be distinguished, which are represented by dark dots and their shadow
surroundings. Moreover, the dark dots are represented by pie char
ts which show the percentage of
each class in the neuron, revealing the diversity degree of the data projected to the neuron. Four, a
mixed bottom
-
up and top
-
down procedure is proposed to perform clustering on
a
trained map
. The
clustering process and resu
lts can be represented by a dendrog
ram which is easily visualized.

The distance hierarchies used in this research were
manually
constructed

and the number of
clusters
on a trained map
was determined interactively by user
. It will be beneficial to investiga
te
the approaches to
automatically
constructing the hierarchies

and determining the optimal number of
clusters

in the future.

ESASOM can be applied to visually cluster and classify mixed
-
type
business
data, such as
customer

data for market segmentation and

direct marketing.

References

[1]

J.

Vesanto, E. Alhoniemi, Clustering of the self
-
organizing map,
IEEE Transaction on Neural
Network
, 2000, Vol. 11, No. 3, 586
-
600.

IICM




第四期

民國九十

年十二月

-
16
-

[2]

M.Y.

Kiang, Extending the Kohonen self
-
organizing map networks for clustering analysis,
Comput
ational Statistics & Data Analysis
, 2001, Vol. 38, 161
-
180.

[3]

S. Wu, W.S. Chow, Clustering of the self
-
organizing map using a clustering validity index
based on inter
-
cluster and intra
-
cluster density,
Pattern Recognition
, 2004,

Vol. 37,
175
-
188.

[4]

T.

Kohonen,

The self
-
organizing map, Proceeding
s

of the IEEE, 1990,

Vol. 78
, No.
9,
1464
-
1480.

[5]

S.B.

Cho,

Structure
-
adaptive SOM to classify 3
-
dimensional point light actors’ gender,
Proceedings of the 9th international Conference on Neural information Processing, 2002
,

Vol.
2, 949
-
953.

[6]

C.C.

Hsu., Generalizing self
-
organizing map for categorical data, IEEE Transactions on Neural
Networks,

2006,

Vol.
17
, No.
2
,
294
-
304
.

[7]

B.

Fritzke,

Growing grid
-

a self
-
organizing network with constant
neighborhood

range and
adaptation st
rength, Neural Process. Letters,

1995, Vol. 2, 9
-
13.

[8]

J.

Blackmore,

R.

Miikkulainen, Incremental grid growing: encoding high
-
dimensional
structure into a two
-
dimensional feature map, International Conference Neural Networks, 1993,

Vol. 1, 450
-
455.

[9]

D. Alahak
oon, S.K.

Halgamuge,

B. Srinivasan,

Dynamic self
-
organizing maps with controlled
growth for knowledge discovery, IEEE Transactions on Neural Networks, 2000,

Vol. 11,
601
-
614.

[10]

B.

Fritzke
,

Unsupervised clustering with growing cell structures, IJCNN’91, IEEE
Press,
Network, 1991, Vol. 2, 531
-
536.

[11]

V.J.

Hodge
,

J.

Austin
,

Hierarchical growing cell structures: tree GCS, IEEE Transaction on
Knowledge and Data Engineering, 2001,

Vol. 13
, No.
2, 207
-
218.

[12]

A.

Rauber
,

D.

Merkl,

M.

Dittenbach,

The growing hierarchical sel
f
-
organizing map:
exploratory analysis of high
-
dimensional data, IEEE Transaction on Neural Network, 2002,

Vol. 13
, No.
6, 1331
-
1341.

[13]

S.B.

Cho,
Self
-
Organizing
m
ap with
d
ynamical
n
ode
s
plitting:
a
l
lo
cation to
h
andwritten
d
igit
r
ecognition
,
Neural Computati
on
, 1997,

Vol. 9, No. 6,
1345
-
1355.

[14]

M.H.

Dunham, Data mining
-
introductory and advanced topics, Prentice Hall
, 2003
.

[15]

D.

Barbara,

J. Couto,

Y.

Li, COOLCAT: an entropy
-
based algorithm for categorical clustering,
Proceedings of the eleventh international conference on Information and knowledge
management, 2002, 582
-
589.

[16]

A.K.

Jain, M.N. Murty
,

P.J. Flynn, Data clustering: a review,
ACM Computer surv
ey
, 1999,
Vol. 31
, No.
3, 265
-
323.

[17]

P.M.

Murphy,

D.W.

Aha, UCI repository of machine learning database,
http://www.ics.uci.
-
edu/~mlearn/MLRepository.html,
visited on 5/20/2006
.

[18]

J. Han,

M.

Kamber
,

Data mining concepts and techniques, Morgan Kaufmann Press, S
an
Francisco, 2001.