Abstract
This paper presents fuzzy clustering
algorithms for mixed features of
symbolic and fuzzy data. El

Sonbaty
and Ismail proposed fuzzy c

means
(FCM) clustering
for symbolic data and Hathaway et al. proposed FCM
for fuzzy data.
T
his paper modified
dissimilarity measure for symbolic
and fuzzy data and then
give FCM clustering algorithms for these mixed
data types. Numerical examples and comparisons are also
given.
Numerical examples illustrate that the modi
fi
ed dissimilarity gives better
results. F
inally, the proposed
clustering
algorithm is applied to real data
with mixed feature variables of symbolic and fuzzy data.
Background
1
、
K

m敡湳l畳瑥t楮i
（
1
）
What is Clustering?
Clustering can be considered the most important unsupervised
le
arning problem; so, as every other problem of this kind, it deals
with finding a structure in a collection of unlabeled data.
A loose definition of clustering could be “the process of
organizing objects into groups whose members are similar in some
way”.
A cluster is therefore a collection of objects which are “similar”
between them and are “dissimilar” to the objects belonging to other
clusters.
In this case we easily identify the 4 clusters into which the data
can be divided; the sim
ilarity criterion is distance: two or more
objects belong to the same cluster if they are “close” according to a
given distance (in this case geometrical distance). This is called
distance

based clustering.
Another kind of clustering is conceptual cluster
ing: two or more
objects belong to the same cluster if this one defines a concept
common to all that objects. In other words, objects are grouped
according to their fit to descriptive concepts, not according to simple
similarity measures.
So, the goal of
clustering is to determine the intrinsic grouping in
a set of unlabeled data. But how to decide what constitutes a good
clustering? It can be shown that there is no absolute “best” criterion
which would be independent of the final aim of the clustering.
Co
nsequently, it is the user which must supply this criterion, in such
a way that the result of the clustering will suit their needs.
For instance, we could be interested in finding representatives for
homogeneous groups (data reduction), in finding “natura
l clusters”
and describe their unknown properties (“natural” data types), in
finding useful and suitable groupings (“useful” data classes) or in
finding unusual data objects (outlier detection).
（
2
）
Clustering algorithms can be applied in many fields, for
instance:
[9][13][14]
Marketing: finding groups of customers with similar behavior given
a large database of customer data containing their properties and
past buying records;
Biology: classification of plants and animals given their features;
Libraries: bo
ok ordering;
Insurance: identifying groups of motor insurance policy holders with
a high average claim cost; identifying frauds;
City

planning: identifying groups of houses according to their house
type, value and geographical location;
Earthquake studies:
clustering observed earthquake epicenters to
identify dangerous zones;
WWW: document classification; clustering data to discover groups
of similar access patterns.
（
3
）
Clustering algorithms may be classified as listed below:
Exclusive Clustering
Overlappi
ng Clustering
Hierarchical Clustering
Probabilistic Clustering
In the first case data are grouped in an exclusive way, so that if a
certain datum belongs to a definite cluster then it could not be
included in another cluster. A simple example of that is
shown in the
figure below, where the separation of points is achieved by a straight
line on a bi

dimensional plane.
On the contrary the second type, the overlapping clustering, uses
fuzzy sets to cluster data, so that each point may belong to two or
more
clusters with different degrees of membership. In this case,
data will be associated to an appropriate membership value.
Instead, a hierarchical clustering algorithm is based on the union
between the two nearest clusters. The beginnin
g condition is realized
by setting every datum as a cluster. After a few iterations it reaches
the final clusters wanted.
Finally, the last kind of clustering use a completely probabilistic
approach.
（
4
）
F
our of the most used clustering algorithms:
K

means
Fuzzy C

means
Hierarchical clustering
Mixture of Gaussians
Each of these algorithms belongs to one of the clustering types
listed above. So that, K

means is an exclusive clustering algorithm,
Fuzzy C

means is an overlapping clustering algorithm, Hierarchical
clustering is obvious and lastly Mixture of Gaussian is a probabilistic
clustering algorithm.
（
5
）
Distance Measure
:
An important component of a clustering algorithm is the distance
measure betwee
n data points. If the components of the data instance
vectors are all in the same physical units then it is possible that the
simple Euclidean distance metric is sufficient to successfully group
similar data instances. However, even in this case the Euclid
ean
distance can sometimes be misleading. Figure shown below
illustrates this with an example of the width and height
measurements of an object. Despite both measurements being taken
in the same physical units, an informed decision has to be made as to
the
relative scaling. As the figure shows, different scalings can lead
to different clusterings
Notice however that this is not only a graphic issue: the problem
arises from the mathematical formula used to combine the distances
between
the single components of the data feature vectors into a
unique distance measure that can be used for clustering purposes:
different formulas leads to different clusterings.
Again, domain knowledge must be used to guide the formulation
of a suitable dista
nce measure for each particular application.
（
6
）
What
is
K

Means Clustering
?
K

means
[1]
is one of the simplest unsupervised learning
algorithms that solve the well known clustering problem. The
procedure follows a simple and easy way to classify a given d
ata set
through a certain number of clusters (assume k clusters) fixed a
priori. The main idea is to define k centroids, one for each cluster.
These centroids shou
l
d be placed in a cunning way because of
different location causes different result. So, the
better choice is to
place them as much as possible far away from each other. The next
step is to take each point belonging to a given data set and associate
it to the nearest centroid. When no point is pending, the first step is
completed and an early grou
page is done. At this point we need to
re

calculate k new centroids as barycenters of the clusters resulting
from the previous step. After we have these k new centroids, a new
binding has to be done between the same data set points and the
nearest new cent
roid. A loop has been generated. As a result of this
loop we may notice that the k centroids change their location step by
step until no more changes are done. In other words centroids do not
move any more.
Finally, this algorithm aims at minimizing an ob
jective function,
in this case a squared error function. The objective function
:
where
is a chosen distance measure between a data
point and the cluster centre , is an indicator of the distance of the n
data points fr
om their respective cluster centres.
The algorithm is composed of the following steps:
I.
Place K points into the space represented by the objects that are
being clustered. These points represent initial group centroids.
II.
Assign each object to the gro
up that has the closest centroid.
III.
When all objects have been assigned, recalculate the positions
of the K centroids.
IV.
Repeat Steps 2 and 3 until the centroids no longer move. This
produces a separation of the objects into groups from which
the me
tric to be minimized can be calculated.
Although it can be proved that the procedure will always
terminate, the k

means algorithm does not necessarily find the most
optimal configuration, corresponding to the global objective function
minimum. The algorit
hm is also significantly sensitive to the initial
randomly selected cluster centres. The k

means algorithm can be run
multiple times to reduce this effect.
K

means is a simple algorithm that has been adapted to many
problem domains. As we are going to see
, it is a good candidate for
extension to work with fuzzy feature vectors.
2
、
䙵Fzy C

䵥慮猠䍬畳瑥t楮i
Fuzzy set theory (FST) is an extension of the classic set theory
developed by Zadeh (1965)
[10]
as a way to deal with vague concepts
.
Fuzzy C

means (FCM) is an extension of classic K

means using the
concepts of fuzzy logic (Ruspini
, 1969
[11]
; Bezdek 1974
[12]
,
1981
[9]
; Dunn, 1974
[8]
).
Fuzzy c

means is a method of clustering which allows one piece
of data to belong to two or more clusters. This method (developed by
Dunn in 1973
[8]
and improved by Bezdek in 1981
[9]
) is frequently
used
in pattern recognition. It is based on minimization of the
following objective function:
where
m
is any real number greater than 1,
u
ij
is the degree of
membership of
x
i
in the cluster
j
,
x
i
is the
i
th of d

dimensional
measured data,
c
j
is the d

dimension center of the cluster, and * is
any norm expressing the similarity between any measured data and
the center.
Fuzzy partitioning is carried out through an iterative optimization
of the objective function shown above, with the updat
e of
membership
u
ij
and the cluster centers
c
j
by:
This iteration will stop when
, where
is a
termination criterion between 0 and 1, whereas k are the iteration
steps. This p
rocedure converges to a local minimum or a saddle point
of
J
m
.
The algorithm is composed of the following steps:
I.
Initialize U=[u
ij
] matrix, U(0)
II.
At k

step: calculate the centers vectors C(k)=[c
j
] with U(k)
III.
Update U(k) , U(
k+1)
IV.
If  U(k+1)

U(k)<
then STOP; otherwise return to step 2.
3
、
Ty灥p潦 c汵l瑥t楮朠摡瑡
The clustering of data set into subsets can be divided into
hierarchical and nonhierarchical or partitioning methods. The
general rationale behind partitioning methods is to choose some
initial partitioning of the data set and then
alter cluster memberships
so as to obtain better partitions according to a predefined objective
function.
Hierarchical clustering procedures can be divided into
agglomerative methods, which progressively merge the objects
according to some distance measu
re in such a way that whenever
two objects belong to the same cluster at some level they remain
together at all higher levels and divisive methods,
which
progressively subdivide the data set
.[15]
Objects to be clustered usually come from an experimental s
tudy
of some phenomenon and are described by a specific set of features
selected by the data analyst. The feature values may be measured on
different scale and these can be
continuous numeric, symbolic, or
structured
, Fuzzy Data
.
（
1
）
Continuous numeric
Con
tinuous numeric
data are well known as a classical data type
and many algorithms for clustering this type of data using
partitioning or hierarchical techniques can be found in the literature
[
16
].
（
2
）
S
ymbolic
Meanwhile, there is some research dealing wit
h
symbolic
objects
[
17
]
–
[
23
] and this is due to the nature of such objects, which is
simple in construction but hard in processing. Besides, the values
taken by the features of symbolic objects may include one or more
elementary objects and the data set ma
y have a variable number of
features [
18
].
A. Feature Types
The symbolic object
A
can be written as the Cartesian
product of
specific values of its features
A
k
’s as
A = A
1
* A
2
*
…
*A
d
The feature values may be measured on different scales resulting
in the
following types: 1)
quantitative features
, which can be
classified into continuous, discrete. and interval values and 2)
qualitative features
, which can be classified into nominal
(unordered), ordinal (ordered), and combinational.
B.
Dissimilarity
The dis
similarity between two symbolic objects
A
and
B
is
defined as
For the
k
th feature,
D(A
k
,B
k
)
is defined using the following three
components:
1)
Dp(A
k
,B
k
)
due to position
p
2)
Ds(A
k
,B
k
)
due to span
s
;
3)
Dc(A
k
,B
k
)
due to content c.
（
3
）
Structured
Structured
objects have higher complexity than continuous and
symbolic objects because of their structure, which is much more
complex, and their representation, which needs higher data structures
to permit the description of relations between
elementary object
components and facilitate hierarchical object models that describe
how an object is built up from the primitives. A survey of different
representations and proximity measures of structured objects can be
found in [
24
].
（
4
）
Fuzzy Data
Fuz
zy data
types often appear in real applications. Fuzzy numbers
are used to model the fuzziness
of data and usually used to represent
fuzzy data. The trapezoidal fuzzy numbers (TFN) are used
most.
Hathaway et al.
[25]
proposed FCM clustering
for symmetric TF
N
using
a parametric approach.
Introduction
W
e may have a date set with mixed symbolic and fuzzy data
feature types.
However, there is no clustering
algorithm to deal with
this mixed type of data. In this paper, we
shall consider the feature
vectors incl
uding
numeric, symbolic and fuzzy data. We
fi
rst modify
Gowda and Diday’s dissimilarity measure
[18]
for symbolic data and
also change Hathaway’s parametric
approach
[25]
for fuzzy data. We
then create a FCM clustering
algorithm for these mixed feature type
of data.
Content
W
e consider the mixed feature type of symbolic and fuzzy data.
We then define its dissimilarity measure. For symbolic data
components we compose the dissimilarity on the basis
of the
modi
fi
ed Gowda and Diday’s d
issimilarity measure [
18
]. We
composed part of the fuzzy data
components using
Hathaway’s
parametric approach [
25
] and Yang’s dissimilarity method [
26
].
Suppose that any feature vector F can be written as a d

tuple of
feature components F
1
…F
d
with
F = F
1
* F
2
*
…
*F
d
For any two feature vectors A and B with A = A
1
×∙ ∙ ∙×A
d
and B
= B
1
×∙ ∙ ∙×B
d
, the dissimilarity
between A and B is de
fi
ned as
First Step
,
in
symbolic data components we compose the
dissimilarity on the basis of the modified Go
wda and Diday’s
dissimilarity measure
.
1.
Quantitative type of A
k
and B
k
:
The dissimilarity between two
feature components of quantitative
type is de
fi
ned as the
dissimilarity of these values due to position, span and content.
Let al=lower limit of A
k
,
a
l
=
upper limit of A
k
,
b
l
=
lower limit of B
k
,
b
u
?=
upper limit of B
k
,
inters =length of intersection of A
k
and B
k
,
ls=span length of A
k
and B
k
= max(a
l
; b
u
) − min(al; bl),
U
k
=the di
ff
erence between the highest and lowest values of the
k
th feature over all objects,
l
a
=a
u
− al,
l
b
=b
u
− bl.
Gowda and Diday’s dissimilarity measure
formula of
three
dissimilarity components are then de
fi
ned as follows:
Thus, D(A
k
; B
k
) = Dp(A
k
; B
k
) + Ds(A
k
; B
k
) + Dc(A
k
; B
k
).
Then this paper re

defined above to:
2.
Qualitative type of A
k
and B
k
:
For qualitative feature types, the
dissimilarity component due
to position is absent. The
term U
k
for
qualitative feature types is absent too. The two components
that
contribute to dissimilarity are “due to span” and “due to content”.
Let l
a
=length of A
k
=the number of elements in A
k
l
b
=length of B
k
=the number of elements in B
k
l
s
=length o
f A
k
and B
k
=the number of elements in the union of A
k
and B
k
.
inters=the number of elements in the intersection of A
k
and B
k
.
The two dissimilarity components are then de3ned as follows:
Thus, D(A
k
; B
k
) = Ds(A
k
; B
k
) + Dc(A
k
; B
k
).
Fo
r this paper, i
t is not necessary to modify.
Second Step,
in f
uzzy feature components
,
We
re

de
fi
ne fuzzy data
based on Hathaway’s parametric model.
We extend symmetric TFNs to all
TFNs by de
fi
ning
its
parameterization as shown in Fig. 1. The notation f
or the
parameterization of
a TFN A is A=m(a1; a2; a3; a4) where we refer
to a1 as the center, a2 as the inner diameter, a3 as
the left outer
radius and a4 as the right outer radius. Using this parametric
representation we can
parameterize the four kinds of
TFNs with real
numbers, intervals, triangular and trapezoidal fuzzy
numbers as
shown in Fig. 2.
Let A=m(a1; a2; a3; a4) and B=m(b1; b2; b3; b4) be any two fuzzy
data. Hathaway’s distance
d
h
(A; B) of symmetric TFNs A and B are
de
fi
ned a
s
For this paper, use
Yang’s
[
26
] distance definition of LR

type fuzzy
numbers
to re

defiined
as follows.
Then
where
Third Step,
If the feature vectors are numeric data in R,
the FCM
clustering
algorithm is well used. However,
in applying
the FCM to
symbolic objects, there are problems encountered,
El

Sonbaty and
Ismail [
27
] proposed a new representation way for cluster centers
as
below:
The membership func
tion
is an important index function
proposed by El

Sonbaty and Ismail
for using
the FCM with symbolic
data. In this paper, except applying
the FCM to symbolic feature
components, we also consider the FCM to fuzzy feature components.
So,
Let {X1;…; Xn} be a data set of mixed feature types. The FCM
objective function is de
fin
ed as
Take the derivative of L w.r.t.
u
ij
and
, we
fi
nd that
We then consider the other two groups of parameters e and a.
（
1
）
For these k’ which are symbolic:
（
2
）
For these k which are fuzzy:
when
then
Mixed

type variables FCM
(
MVFCM)
Algorithm
Step 1: Fix m and c. Given an
. Initialize
a fuzzy c

partition
Step 2:
（
1
）
For
symbolic
feature k’, compute
i
th cluster center
using below Eq :
（
2
）
For
fuzzy
feature k, compute
i
th cluster center
using below Eqs :
Step 3: Update to
?using
below
Eqs.
where
Step 4: Compare
to
in a convenient matrix norm.
IF
, THEN STOP.
ELSE
and GOTO Step 2.
Experimental Result
and comparisons
This paper
use a real data. There are 10 brands of automobiles
from four
companies Ford, Toyota, China

Motor and Yulon

Motor in
Taiwan. The data set is shown in
below
Table .
Use above table , we can get below condition :
◎
In each brand, there are six feature components
—
company,
exhaust, price, color, comfort and
safety features.
◎
In the color feature, the notations W = white, S = silver, D = dark,
R = red,
B =
blue, G = green, P = purple, Gr = grey and Go =
golden.
◎
In all feature components,
company, exhaust, color are symbolic
data and price are real data and comfort and safety are
fuzzy data.
Then , we take above
to illustrate the dissimilarity
calculation
between object one of Virage and the object
fi
ve of M2000 as
follows:
D(Virage
,
M2000)=D(China

Motor
,
Ford) + D(1
.
8
,
2
.
0)
+D(63
.
9
,
64
.
6) + D({W
,
S
,
D
,
R
,
B}
,
{W
,
S
,
D
,
G
,
Go})
+D([10
,
0
,
2
,
2]
,
[8
,
0
,
2
,
2]) + D([9
,
0
,
3
,
3]
,
[9
,
0
,
3
,
3])
,
D(1
.
8
,
2
.
0)=[(1
.
8 + 1
.
8)=2 − (2
.
0 + 2
.
0)=2=(2
.
0 − 1
.
3)]]
^
2
+[0 − 0=(0
.
7 + 0 + 0 − 0)]
^
2 + [0 + 0 − 0
.
2 × 0=(0
.
7 + 0 + 0 −
0)]
^
2
=
0
.
0816
,
D(63
.
9
,
64
.
6)={[2 × (63
.
9 − 64
.
6)]
^
2 + [2 × (63
.
9 − 64
.
6)]
^
2
+[2 × (63
.
9 − 64
.
6)]
^
2 + [2 × (63
.
9 − 64
.
6)]
^
2}=4 = 1
.
96
,
D({W
,
S
,
D
,
R
,
B}
,
{W
,
S
,
D
,
G
,
Go})
= [5 − 5=(5 + 5 − 3)]
^
2 + [5 + 5 − 2 × 3=(5 + 5 − 3)]
^
2 = 0
.
3265
,
D([10
,
0
,
2
,
2]
,
[8
,
0
,
2
,
2])={[2 × (10 − 8) − (0 − 0)]2} + [2 × (10 −
8) + (0 − 0)]2
+
[(2 × (10 − 8) − (0 − 0)) − (2 − 2)]2
+[(2 × (10 − 8) + (0 − 0)) + (2 − 2)]2}=4
= 16
.
Similarly,
D([9
,
0
,
3
,
3]
,
[9
,
0
,
3
,
3]) = 0
.
Thus,
D(Virage
,
M2000) = 1 + 0
.
0816 + 1
.
96 + 0
.
3265 + 16 + 0 =
19
.
3681
.
In order to illustrate the structure of a cluster center for symbolic
data, we give memberships of
these 10 data points as shown
belo
w
Table .
For the cluster
c
enter
of
symbolic
feature components, we have its
membership with 0
.
4 + 0
.
3 + 0
.
35 + 0
.
5 + 0
.
25 + 0
.
8 + 0 + 0 + 0
.
5 +
0
.
2 = 3
.
3
F
or the symbolic feature of company, we have the memberships of
the cluster center
with
China

Motor: (0
.
4 + 0
.
3 + 0
.
35)=3
.
3 = 0
.
3182,
Ford: (0
.
5 + 0
.
25)=3
.
3 = 0
.
2273,
Toyota: (0
.
8 + 0 + 0)=3
.
3 = 0
.
2424,
Yulon

Motor: (0.5+0.2)/3.3=0.2121.
Similarly, we can
fi
nd memberships of other symbolic feature
components for the cluster center
as s
hown in
below
Table .
Now we implement the proposed MVFCM algorithm for this mixed
variables of auto data set
where we choose m = 2, c = 2 and
=
0
.
0001.
The results of memberships of 10 data points are
shown in Table 7. According
results in
below
Tab
le.
W
e have two clusters with C1 = { Virage, New
Lancer, Galant,
M2000, Corolla, Premio G2.0, Cer
fi
ro } and C2 = {Tierra Activa
,
Tercel
,
March}.
Intuitively, the results are very reasonable.
Conclusions
This paper
de
fi
ned the dissimilarity mea
sure for these mixed
features of data and then created the algorithm. Most of the
clustering algorithms can only treat the
same type of data features.
The proposed MVFCM clustering
algorithm allows di
ff
erent types
of
data features such as numeric, symbolic
and fuzzy data.
Related Research
1.
「
Fuzzy clustering
for symbolic data
」
Y. El

Sonbaty, M.A. Ismail
IEEE Trans. Fuzzy Systems 6 (2) (1998) 195
–
204
Abstract
Most of the techniques used in the literature in clustering
symbolic data are based on the hierarchical methodology,
which
utilizes th
e concept of agglomerative or divisive methods as the core
of the algorithm. The main contribution of this paper is to show how
to apply the concept of fuzziness on a data set of symbolic objects
and how to use this concept in formulating the clustering pr
oblem of
symbolic objects as a partitioning problem. Finally, a fuzzy symbolic
c

means algorithm is introduced as an application of applying and
testing the proposed algorithm on real and synthetic data sets. The
results of the application of the new algor
ithm show that the new
technique is quite efficient and, in many respects, superior to
traditional methods of hierarchical nature.
2.
「
Symbolic clustering
using
a new dissimilarity measure
」
K.C. Gowda, E. Diday
Pattern Recognition 24 (6) (1991)
567
–
578
Ab
stract
A new dissimilarity measure, based on "position", "span" and
"content" of symbolic objects is proposed for symbolic clustering.
The dissimilarity measure is new in the sense that it is not just
another aspect of a similarity measure. In the proposed
hierarchical
agglomerative clustering methodology, composite symbolic objects
are formed using a Cartesian join operator whenever a mutual pair of
symbolic objects is selected for agglomeration based on minimum
dissimilarity. The minimum dissimilarity val
ues of different merging
levels are used to compute the cluster indicator values and hence to
determine the number of clusters in the data. The results of the
application of the algorithm on numeric data of known number of
classes are described first so as
to show the efficacy of the method.
Subsequently, the results of the experiments on two data sets of
Assertion type of symbolic objects drawn from the domains of fat

oil
and microcomputers are presented.
3.
「
A parametric model for fusing
heterogeneous fuzzy data
」
R.J. Hathaway, J.C. Bezdek, W. Pedrycz
IEEE Trans.
Fuzzy Systems 4 (3) (1996) 270
–
281
Abstract
Presented is a model that integrates three data types (numbers,
intervals, and linguistic assessments
). Data of these three types come
from a variety of sensors. One objective of sensor

fusion models is
to provide a common framework for data integration, processing,
and interpretation. That is what our model does. We use a small set
of artificial data to
illustrate how problems as diverse as feature
analysis, clustering, cluster validity, and prototype classifier design
can all be formulated and attacked with standard methods once the
data are converted to the generalized coordinates of our model. The
effe
cts of reparameterization on computational outputs are discussed.
Numerical examples illustrate that the proposed model affords a
natural way to approach problems which involve mixed data types
4.
「
On a class of fuzzy c

numbers clustering
procedures for fu
zzy
data
」
M.S. Yang, C.H. Ko
Fuzzy Sets and Systems
84 (1996) 49
–
60.
Abstract
This paper describes a class of fuzzy clustering procedures for
fuzzy data. Most fuzzy clustering techniques are designed
for
handling crisp data with their class memberships usi
ng the idea of
fuzzy set theory. Here we derive new types of
fuzzy clustering
procedures in dealing with fuzzy data. These procedures are called
fuzzy c

numbers (FCN) clusterings.
Specially, we construct these
FCNs for U

type, triangular, trapezoidal and n
ormal fuzzy numbers.
Reference
[1]
J. B. MacQueen (1967): "Some Methods for classification and
Analysis of Multivariate Observations, Proceedings of 5

th Berkeley
Symposium on Mathematical Statistics and Probability", Berkeley,
University of California Pr
ess, 1:281

297
[2]
Andrew Moore: “K

means and Hierarchical Clustering

Tutorial
Slides”
http://www

2.cs.cmu.edu/~awm/tutorials/kmeans.html
[3]
Brian T. Luke: “K

Means Clustering”
http://fconyx.ncifcrf.gov/~lukeb/kmeans.html
[4]
Tariq Rashid: “Clustering”
http://www.cs.bris.ac.uk/home/tr1690/documentation/fuzzy_clustering
_initial_report/node11.html
[5]
Hans

Joachim Mucha and Hizir Sofyan: “Nonhierarchical Clustering”
http://w
ww.quantlet.com/mdstat/scripts/xag/html/xaghtmlframe149.ht
ml
[6[
Osmar R. Zaïane: “Principles of Knowledge Discovery in Databases

Chapter 8: Data Clustering”
http://www.cs.ualberta.ca/~zaiane/courses/cmput690/slides/Chapter8/i
ndex.html
[7]
Pier Luca Lan
zi: “Ingegneria della Conoscenza e Sistemi Esperti
–
Lezione 2: Apprendimento non supervisionato”
http://www.elet.polimi.it/up
load/lanzi/corsi/icse/2002/Lezione%202%
20

%20Apprendimento%20non%20supervisionato.pdf
[8]
J. C. Dunn (197
4
): "A Fuzzy Relative of the ISODATA Process and Its
Use in Detecting Compact Well

Separated Clusters", Journal of
Cybernetics 3: 32

57
[9]
J. C. Bezdek
(1981): "Pattern Recognition with Fuzzy Objective
Function Algoritms", Plenum Press, New York
[10]
Zadeh, L.A. (1965): "Fuzzy sets". Information and Control 8,
338

353.
[11]
Ruspini, E. (1969): "A New Approach to Clustering". Inf. Control 15,
22

32.
[12]
Be
zdek, J.C. (1974): "Numerical taxonomy with Fuzzy sets". J. Math.
Biol. 1 (1), 57

71.
[1
3
] R. Bellman, R. Kalaba, L.A. Zadeh, Abstraction and pattern
classi3cation, J. Math. Anal. Appl. 2 (1966) 581
–
586.
[1
4
] M.S. Yang, A survey of fuzzy clustering, Math.
Comput. Modelling
18 (11) (1993) 1
–
16.
[15]
K. C. Gowda and G. G. Krishna, “
Dissaggregative clustering using
the concept of mutual nearest neighborhood,” IEEE Trans. Syst., Man,
Cybern., vol. 8, pp. 883
–
895, Dec. 1978.
[16]
A. K. Jain and R. C. Dubes, Algo
rithms for Clustering Data.
Englewood
Cliffs, NJ: Prentice Hall, 1988.
[
17
] E. Diday, The Symbolic Approach in Clustering, Classification and
Related Methods of Data Analysis, H. H. Bock, Ed. Amsterdam, The
Netherlands: Elsevier, 1988.
[
18
] K. C. Gowda and
E. Diday, “Symbolic clustering using a new
dissimilarity measure,” Pattern Recogn., vol. 24, no. 6, pp. 567
–
578,
1991.
[
19
]“Symbolic clustering using a new similarity measure, IEEE Trans.
Syst., Man, Cybern., vol. 22, pp. 368
–
378, Feb. 1992.
[
20
]D. H. Fis
her, “Knowledge acquisition via incremental conceptual
clustering,”
Mach. Learning, no. 2, pp. 103
–
138, 1987.
[
21
]Y. Cheng and K. S. Fu, “Conceptual clustering in knowledge
organization,
”IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI

7,
pp.592
–
598, 19
85.
[
22
]R. Michalski and R. E. Stepp, “Automated construction of
classifications:
Conceptual clustering versus numerical taxonomy,”
PAMI, no. 5,pp. 396
–
410, 1983.
[
23
] H. Ralambondrainy, “A conceptual version of the K

means
algorithm,”
Pattern Recogn. Lett
., no. 16, pp. 1147
–
1157, 1995.
[
24
]D. H. Fisher and P. Langley, “Approaches to conceptual clustering,”
in Proc. 9th Int. Joint Conf. Artificial Intell., Los Angeles, CA, 1985,
pp. 691
–
697.
[25]
R.J. Hathaway, J.C. Bezdek, W. Pedrycz, A parametric model fo
r
fusing
heterogeneous fuzzy data, IEEE Trans.
Fuzzy Systems 4 (3)
(1996) 270
–
281.
[26]
M.S. Yang, C.H. Ko
,
On a class of fuzzy c

numbers clustering
procedures for fuzzy data
,
Fuzzy Sets and Systems 84 (1996) 49
–
60.
[27]
Y. El

Sonbaty, M.A. Ismail, Fuzzy c
lusteringfor symbolic data, IEEE
Trans. Fuzzy Systems 6 (2) (1998) 195
–
204.
[28]
K.C. Gowda, T.V. Ravi, Divisive clusteringof symbolic objects
usingthe concepts of both similarity and dissimilarity,
Pattern
Recognition 28 (1995) 1277
–
1282.
[29]
K.L. Wu, M.
S. Yang, Alternative c

means clustering algorithms,
Pattern Recognition 35 (2002) 2267
–
2278.
[30]
H.J. Zimmermann, Fuzzy Set Theory and Its Applications, Kluwer,
Dordrecht, 1991.
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο