Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map

1

Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map
Chung

Chian Hsu
*
Zih

Hui Lin Kuo

Min Wang Wei

Shen Tai
National Yunlin University of Science and Technology
*
E

mail:
hsucc@yuntech.edu.t
w
Abstract
In data mining applications, Self

Organizing Map (SOM) is regarded as an effective visualized
clustering technique for preserving the topological relation in the input data. SOM’s classification
capability has not been much addressed. A direct
application of SOM to classification
yields
poor
accuracy. A variant of SOM, Structure Adaptive SOM (SASOM)
,
was recently proposed for
classifying multidimensional data by incorporating a dynamic node

splitting scheme. However,
SASOM cannot be appropriatel
y applied to categorical or mixed (i.e., categorical and numeric) data
due to its model and the adopted Euclidean distance function suitable for only numerical data.
Moreover, it becomes difficult to cluster data on a trained SASOM due to
its
dynamic split
ting
scheme. In this paper, we propose an extended SASOM (ESASOM), integrating the features of
Generalized SOM (GSOM) which handles mixed

type data, for manipulating both numeric and
categorical data in classification applications. ESASOM possesses the abi
lity of both dynamic
splitting for improving classification performance and measuring the distance between mixed data.
Experimental results demonstrate that this proposed method provides better classification and
visualized results on mixed

type data than
other SOM variants. Clustering procedure on a trained
ESASOM is also proposed.
Keywords
: data mining, Self

Organizing Map (SOM), mixed data, clustering, classification
1.
Introduction
Self

Organizing Map (SOM), proposed by Kohonen, is regarded as an effective
data
visualization technique in data mining applications, especially in the field of data clustering [
1, 2, 3
].
It can map high

dimensional data into low

dimensional space and preserve the topological
relationship between input data through data projectio
n [
4
]. Nevertheless, the conventional SOM
methods must predefine the map structure prior to training. When the map size is not large enough,
it may fail to appropriately reflect the topology of input data. On the contrary, too large map size
may cause simi
lar data to disperse to excess clusters.
S
tructure adaptive SOM (SASOM) [
5
] provides a feasible solution for improving the foregoing
problem of the conventional SOM methods. It adapts dynamically adjusting the SOM map structure
for
raising
classification a
ccuracy by a node

splitting scheme. However, neither SOM nor SASOM
can manipulate both numerical and categorical data due to the Euclidean distance used in their
models inappropriate for categorical data. Hsu [
6
] proposed a Generalized SOM (GSOM)
which
can
properly measure
the dista
nce between categorical data
by the use of
distance hierarchy.
IICM
第
九
卷
第四期
民國九十
五
年十二月

2

However, when the number of data is tremendous, it may fail to preserve the topological order
caused by the predefined fixed map size. Moreover, when used for classif
ication problems, GSOM
can not achieve good performance for the same reason
of fixed map size
.
In this paper, we propose an extended SASOM (ESASOM), integrating SASOM with GSOM,
to manipulate mixed data and improve classification accuracy as well. This pap
er is structured as
follows. In section 2, several related SOM
models
are reviewed and compared. In section 3,
distance hierarchy and ESASOM are elaborated. In section 4, we present several experimental
results of mixed data. Finally, conclusions are state
d in section 5.
2.
SOM methods
To establish background knowledge related to the proposed ESASOM, several SOM
models
are
reviewed
and compared in this section.
2.1.
SOM
Due to its projection capability and topology preservation property, SOM has become a
popular to
ol in visualized clustering of multidimensional data. The SOM training algorithm consists
of two essential steps: identification of the best matching unit (BMU) to
an
input data
,
and
adjustment of BMU and its
neighborhood
[
4
]
so as
to resemble
the
input da
ta. Conventional SOM
handle
s
only numeric data since those two training steps rely on a Euclidean distance function.
When categorical values are encountered, conventional SOM methods usually resort to data
transformation
which converts a categorical value
into
a set of
binary code
s
so
that
the traditional
training algorithm can be applied. However, the approach
suffers
a serious problem: the similarity
of ontology meaning between categorical data cannot be appropriately represented through
measuring the dis
tance of
the
binary codes. For example, Pepsi is intuitively more similar to Coke
than to
coffee
. Nevertheless, they possess the same similarity degree in accordance with the
computation of geometric distance
based on the Euclidean distance function after
the transformation
[
6
]
.
Another problem is the difficulty to appropriately predefine a fixed map size. An inappropriate
size will lead to poor results in clustering and classification applications.
2.2.
Growing
SOM
models
I
ncremental growing SOM
models
were pro
posed to conquer the constraint of fixed network
structure. Generally speaking, the types of growing SOM can be roughly divided into single layer
and multiple layers. The former insert
s
new neurons in between old one
s
on the s
a
me map. The later
applie
s
a h
ierarchical structure of multiple layers where each layer consists of a number of neurons
or
an
SOM.
Single layer growing SOM
models
, such as Growing Grid [
7
]
, Incremental Growing Grid [
8
]
,
Growing SOM [
9
]
and Growing Cell Structure [
10
]
can grow on a fixe
d map and insert neuron
s
according to
different
schemes
. Therefore, they were regarded as a feasible solution for providing
more flexible network structure
via the
inserti
on
. Nevertheless,
they cannot directly
reflect
the
hierarchical relationship of data
which might be inherent in mass
ive
data.
The hierarchical relationship between data can easily be
p
reserved in the hierarchi
cal structure
Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map

3

of multilayer
growing SOMs. They possess the ability
of dimension reduction
that
a
traditional
SOM owns, i.e., project
ing high

dimensional to low

dimensional space
. A
dditionally
,
they are able
to better handle massive data due to their multilayer structure. For instance, TreeGCS [
11
]
, one of
popular Growing Hierarchical SOMs (GHSOMs),
applies a dendroid structure to maint
ain the
cluster
ed
data in each node.
In practical applications, GHSOM has been applied in legal documents
and newspaper
data
set
[
12
]
.
2.3.
Structure Adaptive SOM
Conventional SOM
models
are usually applied
to
clustering problems in which class attribute
does n
ot participate in the clustering process. Since the
fixed
network structure of SOM and data
with different class labels possibly assigned to the same cluster, SOM
does
not perform well when
used for classification problems. To
address
the problem, Structur
e Adaptive SOM (SASOM
)
[
5, 13
]
was proposed to improve
classification
capability
by increasing
class consistence
in
each node via a
dynamic node

splitting scheme. However, like other conventional SOM methods, SASOM
process
es
merely numeric data
.
T
ransforma
tion is needed
for
categorical data
and t
he same problem
of failing to reflect the similarity of categorical values
also
exists
.
2.4.
Generalized SOM
Like encountered in training a
conventional
SOM, measuring the distance between
categorical
data is regarded
as
a
non

trivial
problem.
Various schemes
have
been
proposed
in the literature
,
including
binary encoding, simple matching
, Jaccard
’
s coefficient [
14
]
and an entropy

based
measure [
15
]
. Unfortunately,
these schemes do not take into consideration different ex
tent of
similarity embedded between categorical values, such as Coke is more sim
ilar to Pepsi than
to
c
offee. In [6
] we proposed
a distance representation scheme,
distance hierarchy, which
tackles
this
issue.
Generalized SOM (GSOM) [
6
]
was
proposed
for pro
jecting mixed, categorical and numeric,
data. To better reflect the topology of mixed

type data on a trained map, GSOM processes
categorical
data by the use of distance hierarchy
which considers the similarity embedded in
categorical values
. However, GSOM
is of fixed map size. In addition, GSOM does not treat
classification
attribute separately and is hence not suitable for classification problems.
3.
Extended Structure Adaptive SOM
A
n
e
xtended Structure Adaptive SOM (ESASOM) is proposed to not only improve
cl
assification accuracy but also
have the capability of directly
process
ing
mixed data.
3.1.
Distance hierarchy
for
categorical data
Distance hierarchy was proposed in [
6
] for representing the relationship between categorical
values and measuring the distance bet
ween them.
A distance hierarchy, composed of concept nodes,
links and link weights, represents the ontological relationship between concepts. In this
data
structure, the upper

level
nodes represent more general concepts
while
the lower

level
nodes
represen
t more specific concepts. For example, Coke and Pepsi
representing by the leaf nodes
belong to carbonated drinks
representing by a parent node of the two
, as shown in Fig. 1. Juice,
IICM
第
九
卷
第四期
民國九十
五
年十二月

4

coffee and carbonated drinks all belong to
the root node
“Any”.
To illustr
ate
the difference between distance hierarchy and other popular
distance
schemes
, the
distances between Coke, Pepsi and Mocca are measured through distance hierarchy, simple
matching and binary encoding, as shown in Table 1. In distance hierarchy (c.f., Fi
gure 1), the
weight of each link is assumed to be a constant, say 1, to represent the distance between a node and
its parent node. Neither simple matching nor binary encoding can distinguish the difference
between those three drinks. In other words, the th
ree drinks have the same distance/similarity
according to the foregoing two methods.
In
contrast, via the distance hierarchy Coke is measured to
be more similar to Pepsi than to Mocca. In fact, distance hierarchy is a general schem
e in which
both the
simpl
e matching
and
binary encoding
schemes
can be modeled as special cases.
Even the
subtraction scheme for numeric values can be modeled by a degenerated distance hierarchy with
two nodes and a link weighting by the difference between the maximum and the mini
mum value
[
6
]
.
Any
Juice
Coffee
Carbonated drinks
Orange
A
pple
Latte
Mocca
Coke
Pepsi
M
X
Mi n
Ma x
X
M
(a)
(b)
Figure 1
.
D
istance
hierarch
ies for (a) a categorical and (b) a numeric attribute
.
Table 1
.
Distance comparison
between different methods
.
method
value
Distance
hierarchy
Simple
matching
Binary
encoding
Coke
Mocca
4
1
1.414
Coke
Pepsi
2
1
1.414
Mocca
Pepsi
4
1
1.414
A point can be at any position of a distance hierarchy
. A point, say X,
is denoted by an
anchor
(a leaf node)
N
X
and its positive offset as
X
=(
N
X
,
d
X
) where
d
X
represents
the distance from the root
to
X
. The distance between point
s
X
and
Y
can be calculated as follow
s
.
)
,
(
2
)
,
(
Y
X
LCP
Y
X
d
d
d
Y
X
(1)
where
d
X
and
d
Y
are the distance
s
from the roo
t to
X
and
Y
, respectively.
d
LCP(X,Y)
is the distance
from the root to the
least common point (LCP)
which is defined as one of the three cases: 1) either
X or Y if
they are in the same position;
2)
Y
if
Y
is an ancestor of
X
; otherwise, 3)
LCA(X, Y)
.
M
X
Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map

5

LCA(X
, Y) is the least common ancestor of X and Y
, the deepest node which is an ancestor of X
and Y.
For the example of Fig. 1, assume X=(Coke, 2.0) and
M
=(Mocca, 1.7). The distance
(X,
M
)
= 2.0 + 1.7
2*0 = 3.7. LCP(X,
M
)=Any.
3.2.
Distance between
data pattern and
a neuron
T
he distance between a training pattern and a
map
neuron
is measured by ma
p
ping them to
distance hierarchies and calculating their distance in the
hierarchies
. S
pecifically
, components of a
pattern and a neuron are mapped to their associated hier
archies and the distances of the
corresponding mapping points in individual hierarchies
are aggregated as the total distance.
Su
p
pose
x
,
m
,
and
dh
represent
a training
pattern,
a map
neuron
,
and a set of distance hierarchies,
respec
tively. Then the distanc
e between
x
and
m
is d
e
fined as
2
/
1
,
1
2
2
/
1
,
1
2


)
(
)
(
)
,
(
n
i
i
i
n
i
i
i
i
i
M
X
m
dh
x
dh
m
x
d
(
2
)
where X
i
and M
i
are the mapping points of
x
i
and
m
i
respectively in
dh
i
,
and
n
is the number of
attributes.
We use an example to illustrate the process. Assume that a two

dimensional pattern
x
=
(Coke, 7)
with Dom(
x
2
)=[0, 10], and di
s
tance hierarchies
dh
1
and
dh
2
are given as shown in Fig. 1(a) and Fig.
1(b).
x
1
=Coke is mapped to
X
=(Coke, 2) in
dh
1
.
x
2
=7 is mapped to
X
=(MAX, 7) in
dh
2
.
For a
n
ESA
SOM associated with the training dataset, each compo
nent of a map
neuron
is
assoc
i
ated with the same distance hierarchy of its corresponding attribute of the data. Sim
i
larly,
each component of a neuron can be mapped to a point in the hierarchy.
For example, suppose
a map
neuron
m
=[(Mocca, 1.7), (MAX, 3)], a
nd their hierarchies
dh
1
and
dh
2
are as shown in Fig. 1(a) and Fig. 1(b).
m
1
=(Mocca, 1.7) is mapped to the point
M
=(Mocca,
1.7) in
dh
1
.
m
2
=(MAX, 3) is mapped to the point
M
=(MAX, 3) in
dh
2
.
The distance between
x
and
m
is measured by aggregating the diffe
rences between the
corresponding mapping points of
x
and
m
in the hierarchies. That is, (Coke, 2)

(Mocca, 1.7)=3.7,
(MAX, 7)

(MAX, 3)=4, and then
d
(
x
,
m
) = (3.7**2+4**2)**1/2=5.45.
In data preprocessing, max

min normalization can be performed on each a
ttribute so as to
avoid bias due to various domain ranges of the attributes.
3.3.
Process of ESASOM
We first briefly introduce the training of a traditional SOM and then propose a procedure for
training an extended SASOM.
Figure 2 depicts the projection of the
training data to an SOM. First, the data are iteratively
drawn to train the map by identifying its
best matching unit
and adjusting its neighborhood. At the
final, each data is projected onto a trained map by being assigned to its
best matching unit
. The
t
raining algorithm is outlined in Figure 3 and described as follows. Step 1 i
nitialize
s SOM by
assigning random small values to neurons. For an input pattern
x
, step 2.1 identifies the neuron,
referred to as Best Matching Unit or BMU, which has the minimum
distance to
x
. Step 2.2 adjusts
IICM
第
九
卷
第四期
民國九十
五
年十二月

6

the weights of BMU and its
neighbor
neurons such that the adjusted neurons become more
similar
to
x
. The adjustment is
controlled
by learning rate
α
and
neighborhood
function
h
. The formulas are
shown in Eq. (3) and (4) where
v
,
w
,
t
and
M
represent BMU, weights of a neuron, training step and
the number of map neurons, respectively. The process is repeated till stop criterion met. A popular
criterio
n
is to pre

define a number for the training steps.
}
,...,
1
{
,
)
(
)
(

min
arg
M
i
t
w
t
x
v
i
i
(3)
)]
(
)
(
[
)
(
)
(
)
(
)
1
(
t
w
t
x
t
h
t
t
w
t
w
i
vi
i
i
(4)
x1=[Coke, 7]
x2=[Pepsi, 2]
Training data
x1=[Coke, 7]
m
v
=[(Coke, 1.1), (MAX, 8)]
Final
projection
Figure 2. Training an SOM and projecting the data on
to
the trained map
1. Initialize SOM
2. For each input data
2.1 Identify its Best Matching Unit (BMU)
2.2 Adjust BMU and its neighbourhood
3. Repeat Step 2 till stop criterion met
Figure 3. SOM tra
ining algorithm
3.3.1.
Data training
T
he process of
training an E
SASOM (shown in Fig.
4
)
can be divided to data training stage
and dynamic node

splitting stage
,
elaborated as follows.
In
the data training stage,
the
GSOM
training algorithm
is applied to
project
t
he
multi

dimensional
mixed data
onto a two

dimensional map
. The initial map size is set to 4*4 and
random weights are initially assigned to the nodes
(or neurons)
. During training, nodes which do
not satisfy any of
the
stop conditions are identified for sp
litting. The stop conditions are (i) the
number of data in the node is less than
a user

predefined threshold (say,
2% of the total number of
the data
)
, (ii) the class consistence of a node has reached
a
defined
threshold, and (iii) the variance
of data in
the node is less than
a
defined
threshold. Class consistence and variance of a node are
defined as follows.
Total
c
c
n
n
e
Consistenc
Class
j
j
max
(
5
)
Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map

7

Total
n
i
i
Total
cv
x
n
iance
Var
1
2
)
(
1
(
6
)
where
n
cj
is the number of data
belong
ing
to class
C
j
,
x
i
is
an input data assigned to
the node
,
cv
is
the weight
vector
of the node and
n
Total
is the number of total data in
the
node.
3.3.2.
Dynamic node

splitting
The node which needs splitting
is
expanded to
a
2*2
sub

map
(or
four
child nodes), as shown
in Fig.
5.
The sub

map
is then iteratively
trained like
a regular
GSOM. Before
the
training, the
initial weight of each child node is assigned to the mean of
its
parent node and
neighbor
nodes.
Specifically, for mixed

type training data, the initial weights for the parts of numeric and categorical
attributes are
calculated
separately as follows.
The i
nitial weight
of a
numeric
child

node
attribute
is assigned by the following
formula
.
c
n
k
k
p
c
c
w
w
n
w
1
2
2
1
(
7
)
where
w
c
represents
the initial weight,
w
p
is the weight of its
corresponding
parent

node
attribute
,
w
k
represents
the weight
of the corresponding attribute of a neighbor node,
and
n
c
is
the
number of
neighbo
r
nodes, respectively.
For
the
instance of Fig.
5
, the involving nodes for the weight
calculation of child
c
0
include
two
neighbor n
ode
p
0
,
p
1
, and
parent node
p
4
.
Regarding
a
categorical attribute, the mean
taken as the initial value
is represented by the
centroid of
the points
projected by the
parent node and neighbo
r
nodes onto its distance hierarchy
(
as
shown
in
Fig.
6
). In other w
ords,
the
centroid
p
c
is the point that gives the shortest
total
distance
from
the centroid
to the parent

node point and each neighbor

node point. That is,
i
i
k
p
c
p
p
p
k
2
)
,
(
min
arg
(
8
)
where
p
k
is a point in the hierarchy, and
p
i
represents a mapping
poi
nt
of the
parent
node
or
neighbor
nodes
(e.g.,
P
0
,
P
1
,
or
P
4
in Fig.
6
)
.
IICM
第
九
卷
第四期
民國九十
五
年十二月

8

Yes
Input data
GSOM
training
algorithm
is applied
I
nitialized map size as 4*4
Identify
nodes which need to be
split
Split
the nodes to 2*2 sub

map
s
No
Stop
condition
satisfied?
R
emove nodes with no
projected
data
Visualize
the
t
raining
result
s
Train the
split
nodes
like
GSOM
Yes
Figure
4.
T
raining process of ESASOM.
P
1
P
0
P
2
P
4
P
3
P
1
P
0
P
2
P
4
P
3
C
0
C
1
C
2
C
3
Figure
5. Node P
4
is split to four child nodes
.
p
1
p
4
p
0
Any
Juice
Coffee
Carbonated drinks
Orange
A
pple
Latte
Mocca
Coke
Pepsi
p
c
Figure
6.
T
he centroid
p
c
represent
s
the
mean
of
p
0
,
p
1
and
p
4
.
Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map

9

The identification of the centroid can be restricted to the area in between the involving points.
For example,
any
point
on
the bold lines
in Fig.
6
is a
candidate
for
the centroid.
Specifically, we
dete
rmine
the local centroid of each of the involved bold lines and then identify the global centroid
from the local centroids. The local centroid of a
link, the point on the link giving the minimum total
distance to each
involving
points,
can be determined as
follows.
d
s
p
p
L C P
n
p
p
p
L C P
p
n
p
p
p
L C P
p
p
n
n
d
d
d
ed
i
l
r
d
l
r
i
l
r
i
s
l
r
i
l
r
i
l
r
)
,
(
)
,
(
)
,
(
,
,
,
,
,
,
2
(
9
)
where
l
r
p
ed
,
is the estimated distance from the root to
p
r,l
,
p
r,l
is the
local
centroid of
the
l
th
link
in
the
r
th
branch,
n
s
and
n
d
are the number of
p
r,l
and
p
i
in the same a
nd different link, respectively.
Since
l
r
p
ed
,
is the distance of local centroid on the
l

th
link
to the root, it is expected to be in the
range of
l

1 to
l
. The value shall be set to the end values of the
link
in case that the calculated re
sult
is out of that range. In other words, the offset of local
centroid
l
r
p
d
,
is
defined
as follows.
otherwise
,
1
if
,
1
if
,
,
,
,
,
l
r
l
r
l
r
l
r
p
p
p
p
ed
l
ed
l
l
ed
l
d
(
10
)
Then, the centroid of a
child node is acquired from the local centroids of the
involv
ed
link
s
as
follows.
}
)
,
(
{
min
arg
0
2
s
local
,
p
l
r
n
i
r,l
i
centroid
p
c
p
p
p
(
11
)
3.4.
Clustering data via
the
trained map
To
cluster
data via
a
trained map
, we propose a mixed bottom

up and top

down hierarchical
approach. The hierarchical clustering process and results can be best depicted by a dendrogram, as
shown in Fig.
7.
The bottom

up hierarchical clustering is applied to the trained map in which node
splitting has not been performed. Each
node
with projected data is initially treated as a single cluster
since the data are more similar to one another than
those projected in other nodes
.
The weight
vectors of map nodes are used as the values of these initial clusters.
Then, the standard bottom

up
hierarchical clustering [
16
]
is performed iteratively to merge clusters till one cluster
is formed
.
We
adopt
the
single link
scheme
to measure the distance between two clusters in this research.
Other methods such as complete link and average link are possible alternatives [
16
]
.
Single link
takes the minimum distance of two points belonging to the two clusters
as the
distance of the two
clusters
. The formula
is defined by Eq. (
12
), where
c
i
,
c
j
are two clusters
,
and
p
i
,
and
p
j
represent
two data points
, which are two
node
vectors in this
research
context
, in
c
i
and
c
j
respecitvely
.
IICM
第
九
卷
第四期
民國九十
五
年十二月

10

j
j
i
i
j
i
j
i
c
p
c
p
p
p
d
c
c
d
,
where
)
,
(
min
)
,
(
(
12
)
To
ac
quire
more clusters, a top

down
, divisive
approach is taken. We treat a node splitting
occurred
during the training of an ESASOM as a cluster splitting. Therefore, a node splitting
naturally corresponds to creating a new layer in the dendrogram.
As
shown
in Fig.
7
, before node splitting, there are seven clusters (
nodes
)
each of
which
represents the
data
projected onto the node
. The bottom

up clustering allows users to obtain fewer
clusters. After the node splitting phase of training, the node labeled by 3
was divided to four
sub

nodes
in
which two have projected data. Equivalently, Cluster 3 was divided to two sub

clusters,
as shown in the dendrogram.
1
2
4
6
7
3
5
1
2
6
9
4
7
3
5
Two clusters
Three clusters
Two clusters
Three clusters
5.1
5.2
5.3
3.2
3.1
Figure
7.
Mixed bottom

up and top

down
hierarchical clustering
via a
trained map
.
4.
Experiments
We develo
ped a prototype using C++. Two experimental results of a synthetic and a real mixed
dataset are presented to show the comparison of ESASOM to other
SOM
models.
4.1.
Parameter setting
The initial map size
of ESASOM
was
4*4
.
T
he learning rate
was
a linear functio
n
(
t
) =
(0)
* (1.0

t
/
T
) with the initial value
(0) = 0.9
.
A Gaussian function
was used as
the neighborhood
function
with radius
r
(
t
) = 1.0 + (
r
(0)
–
1) * (1.0

t
/
T
)
and
initial value
r
(0)
set to
the length of the
map
. T
he class consistence and varia
nce thresholds
were set to
0.95 and 0.9, respectively.
4.2.
Synthetic mixed dataset
The synthetic mixed dataset consists of nine classes
with
two categorical attributes
(Department and Drink) and one numerical attribute (Amount), as shown in Table 2. The Amount
values were randomly generated according to normal distribution
with the
specified
mean and
deviation
. Each class possesses
certain
characteristics. Two distance hierarchies are built to
represent the ontological relationship between categorical data as s
hown in Fig.
8
. Each link weight
is
assumed to be 1.
Training results
of 1,000 iterations
of SASOM and ESASOM are shown in Fig.
9
.
Fig.
9
(
a
)
shows
that
the data belonging to the same class
in Table 2
are
projected
into the same
node
, which
effectively impr
oves the drawback
occurred
the
conventional
SOM
s
that
a
node
may contain
data
from
different classes.
The size of
a node
indicates the splitting level of the
neuron. For instance,
the
node
labeled by 2 is in
one level lower than that labe
led by 1. A proble
m shown in Fig.
9
(a) is
Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map

11

that similar classes are not projected nearby, such as class 7, 8 and 9, since SASOM was trained
with the categorical values being transformed to 0

1 binary codes which do not consider the
similarity embedded in the categorical valu
es.
Fig.
9
(b) is the results by the proposed ESASOM method. Like in
the
SASOM, the data in the
same class were projected to the same
node
. Two main differences to Fig.
9
(a) can be identified.
First, similar classes were projected nearby, e.g., class 7, 8,
and 9 in the upper middle of the map,
which better reflects the structure of the data in Table 2. Second, we improved the visualization of
ESASOM
so as to
help users gain more insights of the training results. Specifically, the size of the
dark dots repres
ents the size of the data projected to the
node
. The same color
in
a dark dot implies
a uniform class of the projected data while a multi

colored dark dot indicates multi

class in the
node
(see Fig.
1
2
(
d
)). The size of shadow surrounding a dark dot indicat
es the splitting level of the neural,
like that in
the
SASOM.
Table 2
.
A synthetic mixed dataset
.
Class
Dept.
Drink
Amount
(
μ
,
σ
)
Data
Count
Characteristics
1
MIS
Coke
(500, 25)
60
Management with carbonated
drinks
2
MBA
Pepsi
(400, 20)
30
3
MBA
Pepsi
(300, 15)
30
4
EE
Latte
(500, 25)
60
Engineering with coffee
5
CE
Mocca
(400, 20)
30
6
CE
Mocca
(300, 15)
30
7
SD
App
le
(500, 25)
60
Design with juice
8
VC
Orange
(400, 20)
30
9
VC
Orange
(300, 15)
30
Any
Juice
Coffee
Carbonated
drinks
Orange
Apple
Latte
Mocca
Coke
Pepsi
Any
Engineering
Management
Design
EE
CE
MIS
MBA
VC
SD
Figure
8
.
D
istance hierarchies for categorical attributes.
7
4
1
8
9
5
6
2
3
(a) SASOM
(b) ESASOM
Figure
9
. Visualized tra
ining results of synthetic data.
IICM
第
九
卷
第四期
民國九十
五
年十二月

12

4.3.
Real mixed dataset
The real mixed dataset Adult from the UCI repository [
17
]
was used, which has 14 feature
attributes and one class attribute Salary. An attribute selection technique based on information gain
[
18
]
was app
lied to
determine
the correlation between feature attributes and
the
class attribute.
Seven
relevant
feature attributes were chosen including three categorical attributes (relationship,
marital

status and education) and four numerical attributes (capital

g
ain, capital

loss, age and
hours

per

week). We randomly sampled 10,000 tuples and divided it to two sets of 6,666 and 3,334
tuples
for
training and testing. Three distance hierarchies
were
built to represent the ontological
relationship of categorical data
, as shown in Fig.
10
.
Each link weight was set to 1.
The initial map
size is 4*4 for SASOM and ESASOM and 15*15 for SOM and GSOM, the number of training
iteration is 20,000.
As the results shown in Fig.
11
, the stability of classification accuracy for
E
SA
SOM is better
than
other
SOM
s
according to the results obtained with various
training
iterations.
In addition
, as
shown in Fig.
1
2
SASOM
does not
reflect the cluster
structure
of
the data since the neuron size does
not
reflect
the
number of
data in the neu
ron. In contrast, ESASOM offers more information on
cluster structure
.
Furthermore, the number and distribution ratio of each class in a neuron can be
identified through
the
pie chart on the map or via a pop window, which help users to acquire more
detaile
d information within a
node
(
as
shown
in
Fig.
1
2
(
d
)
).
A
ny
Own

child
Wife
Not

in

family
Unmarried
Other

relative
Husband
(a) Relationship
Divorced
A n y
S
i n g l e
C
o u p l e
Never

married
Separated
W
idowed
M
arried

spouse

absent
M
arried

AF

spouse
M
arried

civ

spouse
( b ) M a r i t a l

s t a t u s
A
ny
L i t t l e
J u n i o r
H i g h S c h o o l
C o l l e g e
A d v a n c e d
1
st

4
th
5
th

6
th
9
th
7
th

8
th
10
th
11
th
12
th
HS

grad
Some

c
ollege
Bachelors
Assoc

voc
Assoc

acdm
Master
s
Doctorate
Prof

school
Preschool
(
c
)
E d u c a t i o n
Figure
10
.
D
istance hierarchies for
the selected
categorical attributes of Adult dataset.
70
72
74
76
78
80
82
84
10,000
20,000
30,000
40,000
50,000
Accuracy
(
%)
Number of training iteration
SOM
GSOM
E
SASOM
SASOM
Figure
11
. Classificatio
n accuracy
with
various training iterations.
Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map

13

(a) SOM
(b) GSOM
(c) SASOM
(d) ESASOM
Figure 1
2
. Visualized training results of Adult dataset.
4.3.1.
Analysis of vari
ous
map sizes
The
various
map sizes from 16 to 400 were
used
to
analyz
e
the
impact on classification
accuracy. As shown in Fig.
1
3
, the distribution of accuracy lies mainly in between 80% and 82%
and most of
the
times around 81%, indicating that classification accuracy is relatively stable by our
method regardless o
f the initial map size. In other words, a small initial map size does not degrade
classification accuracy. This is important since a small initial map size makes the hierarchical
clustering easier in the later stage of clusters analysis
78
79
80
81
82
16
36
64
100
121
144
196
225
256
289
324
361
400
Map size
Accuracy
(
%)
Figure 1
3
. Classification accuracy of ESASOM with variant map sizes.
4.4.
Analysis of visualized clustering and classification based on trained
ESASOM
We applied
the
mixed bottom

up and top

down hierarchical clustering to cluster the trained
ESASOM an
d then compared classification accuracy
with respect to various
clustering results.
We first cluster the trained ESASOM by using the mixed bottom

up and top

down hierarchical
clustering mentioned in Section 3.
4
. The clustering results
is
represented by a
dendrogram
as shown
in Fig.
1
4
(c)

(d) constructed
in accordance with
two trained maps (Fig.
1
4
(a) and (b))
obtained prior
to and after node splitting. The right

lower number of a node in Fig. 14(a) represents the node
number while the left

upper number ind
icates the number of projected data.
An advantage of using
a dendrogram is easily to get the clusters of an intended number, For example, the dotted line in Fig.
1
4
(c) represents dividing the data to three clusters.
Table
3
shows
salary
distribution of
fo
ur clustering results
. For instance, if we group the
trained map to 3 clusters,
the
distribution in cluster 1
is
62.75%
of <=50K and 37.25% of >50K.
Experimental results indicate that larger number of clusters produces better performance in terms of
IICM
第
九
卷
第四期
民國九十
五
年十二月

14

separa
ting salary classes. This is supported by t
he total entropy
calculated on the salary attribute
reduced
from 0.6357 to 0.6157
when the cluster number increases from 3 to 15
.
1
5
6
7
8
9
10
11
12
13
14
15
1428
1218
16
620
26
361
335
873
16
60
557
91
95
210
760
2
3
4
7
8
9
10
11
13
14
15
12
2
6
5
3
4
1
7
7
7
14
14
14
1
1
2
2
6
11
11
11
15
15
(a)
(b)
(c)
(d)
Figur
e 1
4
. Dendrogram constructed from a trained ESASOM. (a) The training results with an
initial
size of 16 nodes. (b) The training
results after node
splitting. (c) The upper part of the dendrogram
obtained by applying bottom

up hierarchical clustering of (a)
. (d) The lower part of the dendrogram
obtained according to the
top

down node
splitting
from (a) to
(b)
Table 3.
Distribution
of <=50k and >50k
in each cluster
and
total
entropy
(
* percentage
)
C No
class
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
ALL
Total
Entropy
<=50K
62.75*
47.29
93.39












75.73
0.6357
>50K
37.25
52.71
6.61












24.27
<=50K
62.75
47.06
87.50
95.48
61.54
92.55
93.8








75.73
0.6345
>50K
37.25
52.94
12.50
4.52
38.46
7.45
6.2
0








24.27
<=50K
62.75
44.50
87.50
95.48
61.54
55.68
93.96
89.10
97.8
91.15
94.87




75.73
0.6305
>50K
37.25
55.50
12.50
4.52
38.46
44.32
6.04
10.90
2.20
8.85
5.13




24.27
<=50K
62.75
44.50
87.50
95.48
61.54
55.68
80.60
99.08
8
7.50
80.00
90.13
97.80
88.42
92.38
94.87
75.73
0.6157
>50K
37.25
55.50
12.50
4.52
38.46
44.32
19.40
0.92
12.50
20.00
9.87
2.20
11.58
7.62
5.13
24.27
4.4.1.
Analysis of classification accurac
y
We further compared classification accuracy on
various
clustering
results obtained
according
to
the
resultant dendrogram of Fig. 14(c)

(d)
and GSOM. The classification is performed according
to majority vote of
the data in
the cluster
to which
the
unknown
input data is projected on the
trained
map
. For the example of div
iding the data to three clusters as shown in Fig.
14
(c), if an input
data is projected to node 6 which belongs to cluster 2, the input will be classified to the class of
>50K since the majority of cluster 2 is >50K
, as show
n
in row 3 of Table 3
.
Table
4
sh
ows classification accuracy on the test data.
ESASOM obtains better accuracy
(80.92%)
than
GSOM and any results
from
the hierarchical clustering
of ESASOM
. This is
expected because
the class consistence of a node in the final
ESASOM is higher
after node sp
litting.
In contrast, a cluster formed by
merging several
sub

clusters usually increases its diversity of
classes, resulting in reduced classification accuracy. We can
see in the table the accuracy dropped
from 76.42% to 76.06% after merging the clusters f
rom 15 to 3.
Visualized Clustering and Classification of Mixed Data via
Extended Structure Adaptive Self

Organizing Map

15

Table 4
Classification accuracy on test data
Method
Correct
(data count)
In
correct
(data count)
Accuracy
(%)
Hierarchical

3G
2536
798
76.06
Hierarchical

4G
2536
798
76.06
Hierarchical

5G
2536
798
76.06
Hierarchical

6G
2540
794
76.18
Hie
rarchical

7G
2504
830
75.10
Hierarchical

8G
2548
786
76.42
Hierarchical

9G
2548
786
76.42
Hierarchical

10G
2548
786
76.42
Hierarchical

11G
2548
786
76.42
Hierarchical

12G
2548
786
76.42
Hierarchical

13G
2548
786
76.42
Hierarchical

14G
2548
786
76.42
Hierarchical

15G
2548
786
76.42
GSOM
2548
786
76.42
ESASOM
2698
636
80.92
5.
Conclusions
An extended SASOM integrates the ideas from SASOM and GSOM. The extended model
takes the advantages of and avoids the disadvantages of the conventional models. In p
articular, we
improve the conventional models in four aspects. First, ESASOM can directly cluster and classify
mixed categorical and numeric data. The different extent of similarity between categorical values is
considered
in our model
to better reflect
th
e natural structure in
the
data
to
the trained map
. Second,
the model improves classification accuracy due to the proper handling of categorical data. Third, we
improve the visualization of a trained ESASOM, which helps users to get more insights about the
projected data. Specifically, the size of data projected in a neuron and the level to which a neuron
has been split can be distinguished, which are represented by dark dots and their shadow
surroundings. Moreover, the dark dots are represented by pie char
ts which show the percentage of
each class in the neuron, revealing the diversity degree of the data projected to the neuron. Four, a
mixed bottom

up and top

down procedure is proposed to perform clustering on
a
trained map
. The
clustering process and resu
lts can be represented by a dendrog
ram which is easily visualized.
The distance hierarchies used in this research were
manually
constructed
and the number of
clusters
on a trained map
was determined interactively by user
. It will be beneficial to investiga
te
the approaches to
automatically
constructing the hierarchies
and determining the optimal number of
clusters
in the future.
ESASOM can be applied to visually cluster and classify mixed

type
business
data, such as
customer
data for market segmentation and
direct marketing.
References
[1]
J.
Vesanto, E. Alhoniemi, Clustering of the self

organizing map,
IEEE Transaction on Neural
Network
, 2000, Vol. 11, No. 3, 586

600.
IICM
第
九
卷
第四期
民國九十
五
年十二月

16

[2]
M.Y.
Kiang, Extending the Kohonen self

organizing map networks for clustering analysis,
Comput
ational Statistics & Data Analysis
, 2001, Vol. 38, 161

180.
[3]
S. Wu, W.S. Chow, Clustering of the self

organizing map using a clustering validity index
based on inter

cluster and intra

cluster density,
Pattern Recognition
, 2004,
Vol. 37,
175

188.
[4]
T.
Kohonen,
The self

organizing map, Proceeding
s
of the IEEE, 1990,
Vol. 78
, No.
9,
1464

1480.
[5]
S.B.
Cho,
Structure

adaptive SOM to classify 3

dimensional point light actors’ gender,
Proceedings of the 9th international Conference on Neural information Processing, 2002
,
Vol.
2, 949

953.
[6]
C.C.
Hsu., Generalizing self

organizing map for categorical data, IEEE Transactions on Neural
Networks,
2006,
Vol.
17
, No.
2
,
294

304
.
[7]
B.
Fritzke,
Growing grid

a self

organizing network with constant
neighborhood
range and
adaptation st
rength, Neural Process. Letters,
1995, Vol. 2, 9

13.
[8]
J.
Blackmore,
R.
Miikkulainen, Incremental grid growing: encoding high

dimensional
structure into a two

dimensional feature map, International Conference Neural Networks, 1993,
Vol. 1, 450

455.
[9]
D. Alahak
oon, S.K.
Halgamuge,
B. Srinivasan,
Dynamic self

organizing maps with controlled
growth for knowledge discovery, IEEE Transactions on Neural Networks, 2000,
Vol. 11,
601

614.
[10]
B.
Fritzke
,
Unsupervised clustering with growing cell structures, IJCNN’91, IEEE
Press,
Network, 1991, Vol. 2, 531

536.
[11]
V.J.
Hodge
,
J.
Austin
,
Hierarchical growing cell structures: tree GCS, IEEE Transaction on
Knowledge and Data Engineering, 2001,
Vol. 13
, No.
2, 207

218.
[12]
A.
Rauber
,
D.
Merkl,
M.
Dittenbach,
The growing hierarchical sel
f

organizing map:
exploratory analysis of high

dimensional data, IEEE Transaction on Neural Network, 2002,
Vol. 13
, No.
6, 1331

1341.
[13]
S.B.
Cho,
Self

Organizing
m
ap with
d
ynamical
n
ode
s
plitting:
a
l
lo
cation to
h
andwritten
d
igit
r
ecognition
,
Neural Computati
on
, 1997,
Vol. 9, No. 6,
1345

1355.
[14]
M.H.
Dunham, Data mining

introductory and advanced topics, Prentice Hall
, 2003
.
[15]
D.
Barbara,
J. Couto,
Y.
Li, COOLCAT: an entropy

based algorithm for categorical clustering,
Proceedings of the eleventh international conference on Information and knowledge
management, 2002, 582

589.
[16]
A.K.
Jain, M.N. Murty
,
P.J. Flynn, Data clustering: a review,
ACM Computer surv
ey
, 1999,
Vol. 31
, No.
3, 265

323.
[17]
P.M.
Murphy,
D.W.
Aha, UCI repository of machine learning database,
http://www.ics.uci.

edu/~mlearn/MLRepository.html,
visited on 5/20/2006
.
[18]
J. Han,
M.
Kamber
,
Data mining concepts and techniques, Morgan Kaufmann Press, S
an
Francisco, 2001.
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο