Fuzzy Clustering Techniques: Fuzzy C-Means and Fuzzy Min-Max Clustering Neural Networks

glassesbeepingAI and Robotics

Oct 20, 2013 (3 years and 9 months ago)

204 views

Fuzzy Clusterin
g Techniques: Fuzzy C
-
Means and

Fuzzy Min
-
Max Clustering Neural Networks

Benjamin

James

Bush

SSIE 617
Term
Paper
,
Fall 2012



|1|

INTRODUCTION

Data clustering is a data processing strategy which aims to organize a collection of data points (hereby
simply called points) into groups.
Traditionally, the data set is partitioned so that each point belongs to
one and only one cluster.
However, unless t
he data is very highly clustered, it is often the case that
some points do not completely belong to any one cluster. With the
arrival

of Fuzzy

clustering, these
points could be assigned a set of membership degrees associated with each cluster instead of
ar
tificially pigeonholing it as belonging to only one.
The volume of literature available on fuzzy
clustering is immense; a general review of the literature is outside the scope of this term paper. This
paper discusses only 2 approaches to fuzzy clustering:
the ubiquitous Fuzzy C
-
Means Clustering
algorithm and the less well known but interesting Fuzzy Min
-
Max Clustering Neural Network. These
approaches are discussed in sections 2 and 3, respectively.

In
part 4 I will briefly discuss several
applications which

use the fuzzy clus
tering techniques covered here.


|2|

THE FUZZY C
-
MEANS

(FCM) C
LUSTERING ALGORITHM

Fuzzy C
-
Means, also known as Fuzzy K
-
Means and Fuzzy ISODATA,

is one of the oldest and most
ubiquitous fuzzy clustering al
gorithms. FCM is a generalization

of the K
-
Means clustering algorithm,
which is a simple and widely used method for finding crisp clusters.

Understanding FCM’s crisp
ancestor is instructive and is discussed below.


|2.1|

K
-
MEANS CLUSTERING

The “K” in K
-
Means refers to the fact that in K
-
M
eans clustering, th
e number of clusters is decided
before the process begins. The “Means” in K
-
Means refers to the fact that each cluster is characterized
by the mean of all the points that belong to the cluster. Thus, in K
-
Means clustering our goal is lit
erally
to find K means, thereby giving us the K clusters we seek.

In particular, the means we seek are those
which minimize the cost function depicted in the following figure:



Figure
1
: Cost fucntion minimized during K
-
Means clu
stering. Equation taken from
[
1
]
. Annotations by me.

The process is initiali
z
ed by picking K different “centroids” at random from the space in which the
points are embedded. From here, the
K
-
Means process can be divided into two phases:


Phase 1:
Form Clusters.
Each centroid is associated with a different cluster. To form these clusters,
each point in the data set is evaluated in turn. When evaluated, a point is assigned to the cluster
corresponding to the closest centroid.

Phase
2
:
Move Centroids.

Each of the centroid is now moved to the position obtained by taking the
mean of each of the points in the cluster associated with the centroid.


These two phases are repeated in turn until co
nvergence is reached (i.e. until the value of the cost
function stops decreasing significantly). It should be noted that there is no guarantee that the cost
function will be minimized. The outcome depends on initial conditions.

A flow chart is provided bel
ow
to aid the reader’s understanding of the
algorithm
.



Figure
2
: A flow chart summarizing the K
-
Means Clustering Algorithm
.

Understanding
of the K
-
means clustering algorithm can
be further enhanced by viewing a series of
animated GIF images produced by Andrey A. Shabalin, available at http://shabal.in/visuals.html

Key frames from the animation are provided below for the reader’s convenience. Visually inspecting
these key frames in

conjunction with the above flow chart can be

very instructive.





Figure
3
: Key frames from an animation on k
-
means clustering by
Andrey A. Shabalin
, PhD

|2.
2
|

FUZZY C
-
MEANS CLUSTERING (FCM)

FCM is a generalization of K
-
Means. While K
-
Means assigns each point to one and only one cluster,
FCM allows clusters to be fuzzy sets, so that each point
belongs to all clusters to varying degrees, with
the following restriction: The sum of all membership

degrees for any given data point is equal to 1.


The cost function used in FCM (shown in figure 3) is very similar to the one used by K
-
Means, but there
are some key differences: The inner sum contains a term for each data point in the set. Each of these
terms is weighed by a membership degree raised to

the power of a fuzziness exponent.



Figure
4
: Cost function for FCM. Figure adapted from
[
1
]
. Annotations by me.

Compare with Figure 1.

Applying the method of Lagrange multip
liers to minimize the above cost functions yields the following
necessary (but not sufficient) constraints

[
1
]
:




Like K
-
means, FCM

is initialized by choosing a fixed number of centroids at random. Also like K
-
means,
FCM after initialization is divided into two phases:

Phase 1:
Form Clusters.
Each centroid is associated with a different
fuzzy

cluster. To form these
clusters, each poin
t in the data set is evaluated in turn. When evaluated, a point is assigned a
membership degree with respect to
each

cluster. The numerical value of these degrees is given by the
second of the above constraints.

Phase
2
:
Move Centroids.

Each of the centroi
d is now moved to the position obtained via the first of
the above constraints.


The reader should verify that the flow chart for FCM provided below
closely resembles the flow chart
for K
-
Means above. Note also the incorporation of the aforementioned const
raints.



Figure
5
:

A flow chart summarizing the FCM

Clustering Algorithm.

Compare to Figure 2

It is instructive to visualize

fuzzy clusters

visualized by FCM
.
For this purpose, it is convenient to use a
one dimensional data set, as is used in Figure
6

below.


Figure
6
:
Three

fuzzy clusters produced by FCM on a 1
dimensional

data set.

Figure taken from
[
2
]
.


MATLAB’
s

fcmdemo

command provides a great way to interact with FCM
using

2 dimensional data.
1

One can run FCM on several preloaded data sets or provide a custom data file. The number of clusters
can be varied, as can the fuzziness exponent and the
stopping criteria
. Once FCM has finished running,
one can directly view and manipulate each of the fu
zzy clusters. Screenshots follow on the next page.






1

MATLAB’s
fcmdemo depends

on the Fuzzy Logic Toolbox, which is available for purchase from MathWorks at the following
URL:
http://www.mathworks.com/products/fuzzy
-
logic/index.html

The Laptops i
n the Enginet classrooms
at Binghamton University
already have the Fuzzy Logic Toolbox installed.

To start the
demo, simply enter the command
fcmdemo

into the MATLAB command window.


Figure
7
: The main window of
fcmdemo

after running it on data set 2 with C = 3 and m = 2.




Figure
8
: Membership function plots after running
fcmdemo with fuzziness exponent m = 1.5




Figure
9
:
Membership function plots after running fcmdemo

with fuzziness exponent m = 4
. Compare with figure 8.

|
3
|
FUZZY MIN
-
MAX CLUSTERING NEURAL NETWORKS

(FMMCNN)

FCM

requires that the number of clusters be specified in advance. However, the number of clusters
that should be used is not always clear, as the figure below illustrates.



Figure
10
: A data set (top) can be clustered into 4 (bottom

left) or 2 (bottom right)

clusters.

There are many fuzzy clustering techniques which will automatically determine the number of clusters
that should be used. Among them is the Fuzzy Min
-
Max Clustering Neural Network (FMMCNN), which
we discuss in this sect
ion.

|
3
.
1
|

HYPERBOX FUZZY SETS

The fuzzy clusters used in a FMMCNN are called hyperbox fuzzy sets.
A hyperbox fuzzy set

has a
hyperbox core, so that every point that lies within the hyperbox is given a membership degree of 1.
The membership function
of the
hyperbox fuzzy set

then decays linearly as one moves further away
from the hyperbox

core. A system
wide parameter




controls the
rate

of this decay.

A hyperbox is completely defined by its min point and its max point. The min point is a vector who
se
components provide a series of lower bounds for each dimension which must be surpassed to remain
within the hyperbox. For example, suppose we have a 2 dimentional hyperbox with min point <5, 20>.
Then for a data point <x, y> to lie within the hyperbox,
it is necessary that x ≥ 5 and y ≥ 20.
Analogously, the max point provides a series of upper bounds for each
dimension
, which must be
respected

to remain within the hyperbox.

A formal definition of the membership function

associated
with a hyperbox fuzzy s
et is shown on the next page. To gain a more practical / intuitive understanding
of hyperbox fuzzy sets, contour plots can be generated and manipulated using a Mathematica
notebook created by me.
With this notebook, one can control the position of the min
and max points,
as well as the gamma membership decay parameter. The notebook can also plot one dimensional
hyperbox fuzzy sets, thereby revealing that hyberbox fuzzy sets can be thought of as generalized
symmetric trapezoidal fuzzy numbers.
The notebook i
s available from my website at the following URL:

http://www.benjaminjamesbush.com/fuzzyclustering

Screenshots are given on the following
page for the reader’s convenience.


Figure
11
: Membership function of a Hyperbox Fuzzy Set. Adapted from
[
3
]
2







Figure
12
: Manipulating hyperbox fuzzy sets in Mathematica
. One
dimensional (top) and two dimensional (bottom).




2


[
3
]

contains some

typographical errors. They have been corrected
in Figure 11.

|3.1| FUZZY MIN MAX NEURAL NETWORKS

A major advantage of using hyperbox fuzzy set for fuzzy clustering is the fact that they can easily be
implemented as
2 layer artificial
neural networks.

The
following figu
re illustrates how this is done.


Figure
13
: A hyperbox fuzzy set implemented as a 2 layer artificial neural network.


The input layer contains one node per dimension of the space in which the data points are embedded.
Each input
node is connected to the output node via a pair of weighted links which are weighed by the
corresponding component value of the max point and min point, respectively. Implementing a
clustering system in this way allows for the development of massively para
llel systems that can quickly
calculate the membership values for incoming data.


|3.1|
EVOLVING FUZZY CLUSTERS

Another advantage of hyperbox fuzzy sets is their relative simplicity with which they can be expressed.
As previously mentioned, a hyperbox

fuzzy set can be completely represented by a min point and a
max point. This makes it very easy to design an evolutionary
algorithms

which can be used to evolve
sets of hyperbox fuzzy sets for use within fuzzy min
-
max clustering neural networks. One such
algorithm was published by Fogel and Simpson in
[
3
]

and is o
utlined in the flow chart on the next page.
For their fitness function, Fogel and Simpson use the minimum description length (MDL), which is i
n
some way an optimal compromise between fitting the data and using the smallest possible number of
clusters. For more information on the MDL, see
[
4
]
.



Figure
14
: Flow chart
summerizing the evolutionary algorithm used in
[
3
]


|
4
|
APPLICATIONS

Fuzzy clustering is becoming an important data processing technique in many scientific fields. While
the use of FCM is widespread, fuzzy min
-
max clustering neural networks are harder to come by.
Below
I list of a few i
nteres
ting applications which I encoun
tered in the literature.


GENETICS

Gasch and Eisen used FCM to find clusters of

yeast

gene
s
[
5
]
.


POLITICS

Teran

and
Meier d
esigned a fuzzy sys
tem that used FCM to simplify the complex

political landsca
pe
and recommend candidates to voters based on fuzzy data obtained from surveys

[
6
]
.


RADIOLOGY

John, Innocent and Barnes

used a fuzzy min
-
max clustering neural network to group x
-
ray images of
the tibia into clusters
[
7
]
.


INDUSTRIAL ENGINEERING

Dobado

et. al. used a fuzzy min
-
max clustering neural network to group parts into part families, an
important step in the formation of cells for cellular manufacturing

[
8
]
.



APPENDIX: MATHEMATICA CODE


The following Mathematica code can be used to create interactive plots of hyperbox fuzzy sets in one
and two dimensions. The code has been tested on Mathematica 8.



WORKS CITED


[1]

J. S. R. Jang, C
-
T Sun, and E Mizutani,
Neuro
-
fuzzy and

soft computing: a computational approach to
learning and machine intelligence
.: Prentice Hall, 1997.

[2]

Matteo Matteucci. (2012, May) A Tutorial on Clustering Algorithms: Fuzzy C
-
Means Clustering.
[Online].
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html

[3]

D B Fogel and P K Simpson, "Evolving Fuzzy Clusters," in
IEEE International Conference on
Neural
Networks
, 1993.

[4]

Peter Grünwald. (2008, August) Videolectures.net: MDL Tutorial. [Online].
http://videolectures.net/icml08_grunwald_mld/

[5]

A P Gasch and M B Eisen, "Exploring the c
onditional coregulation of yeast gene expression through
fuzzy k
-
means clustering,"
Genome Biology
, vol. 3, no. 11, October 2002.

[6]

L Teran and A Meier, "A Fuzzy Recommender System for eElections," in
Electronic Government and
the Information Systems Pe
rspective
., 2010, pp. 62
-
76.

[7]

R I John, P R Innocent, and M R Barnes, "Neuro
-
fuzzy clustering of radiographic tibia image data
using type 2 fuzzy sets,"
Information Sciences
, vol. 125, no. 1
-
4, pp. 65
-
82, June 2000.

[8]

D Dobado, S Lozano, J M Bueno,
and J Larrañeta, "Cell formation using a Fuzzy Min
-
Max neural
network,"
International Journal of Production Research
, vol. 40, no. 1, pp. 93
-
107, November 2010.