# Clustering Algorithms: Basics and Visualization

AI and Robotics

Nov 24, 2013 (4 years and 5 months ago)

130 views

HELSINKI UNIVERSITY OF TECHNOLOGY 1.8.2002
Laboratory of Computer and Information Science
T-61.195 Special Assignment 1

Clustering Algorithms: Basics and
Visualization

Jukka Kainulainen
47942F

1
Clustering Algorithms: Basics and Visualization

Jan Jukka Kainulainen
HUT
jkainula@cc.hut.fi
Abstract

This paper discusses the issue of clustering algorithms. Clustering
algorithms are important in many fields of science. Paper provides the basic
concepts and presents an implementati on of a few popular algorithms. The
problem of visualization of the result is also discussed and a simple solution
in a dimension limited case is provided.
1 INTRODUCTION
Clustering is one solution to the case of unsupervised learning, where class labeling
information of the data is not available. Cl ustering is a method where data is divided
into groups (clusters) which seem to make se nse. Clustering algorithms are usually fast
and quite simple. They need no beforehand knowledge of the used data and form a
solution by comparing the given samples to each other and to the clustering criterion.
The simplicity of the algorithms is also a drawback: The results may vary greatly when
using a different kind of clustering criter ia and thus unfortunately also nonsense
solutions are possible. Also with some algorithms the order in which the original
samples are introduced can make a great difference to the result.
Despite the drawbacks clustering is used in many fields of science including machine
vision, life and medical sciences and informa tion science. One reason for this is the fact
that intelligent beings, humans included, are known to use the idea of clustering in many
brain functions.
This paper introduces the basics of clustering and the concepts needed to understand and
implement the algorithms. A couple of algorithms are implemented and compared. Also
the problem of visualization of the result is discussed and one solution is provided in the
popular OpenGL framework.
The second chapter introduces the basic concepts the reader should be aware of to be
able to read this paper efficiently. The third chapter introduces the most popular
algorithms and discusses their characteristics. The fourth chapter complements the
survey to the clustering algorithms by introducing some special algorithms. Chapter five
then provides an implementation of four al gorithms discussed in chapter three and
chapter six provides a way to visualize the result. Finally, chapter seven runs some tests
on the implemented algorithms.

2
2 BASIC CONCEPTS
When classifying different kind of samples a way to represent the sample in a
mathematical way is needed. From now on we assume that the features are represented
in a feature vector. A feature vector is a vector including different features for the
sample. That is, with l features x
i
the feature vector is of the form
x = [x
1
, x
2
, , x
l
]
T
where T denotes transposition and x
i
are typically real numbers. The selection of these
features is often very hard due to the fact there usually are a lot of features from where
the most representative ones should be selected. This is because the computational
complexity of the classification (clustering) algorithm grows with every feature selected.
Feature selection and the reduction of dimensionality of the data are beyond this
document. For additional information about these tasks a good place to start from is
Thedoridis (1999).
2.1 Linear Independence, Vector Space
A set of n vectors is said to be linearly independent if equation
k
1
x
1
+ k
2
x
2
++ k
n
x
n
= 0 (2.1)
implies k
i
= 0 for all i. If a nonzero solution can be found then the vectors are said to be
linearly dependent. If the vectors are linearly dependent then at least one of them can be
expressed as a linear combination of the others.
Now, given m vectors x
n
with l components (as before) we can form a set V of all the
linear combination of these vectors. V is called the span of these vectors. The maximum
number of linearly independent vectors in V is called the dimension of V. Clearly, if the
given m vectors were linearly independent then the dimension is m and x
n
is called the
basis for V.
An n-dimensional vector space R
n
is the space of all vectors with n (real) numbers as
components. The symbol R itself refers to a single dimension of real numbers. Thus for
n = 3 we get R
3
for which the basis is, for example, vectors
x
1
= [1, 0, 0]; x
2
= [0, 1, 0]; x
3
= [0, 0, 1]
so every vector in R
3
can be expressed with these three basis vectors. A more
comprehensive examination of vector spaces can be found from almost any engineering
mathematics book, for example, Kreyszig (1993).
2.2 Data Normalization
The used data is often scaled to be within a certain range. Neural networks, for example,
often require this. A classical way to normalize the N available data vectors is with the
mean value

3

=
=
N
i
ikk
x
N
x
1
1
(2.2)
where k denotes the feature and variance
2
1
2
)(
1
1
kik
N
i
k
xx
N

=

=
σ
(2.3)
now, to gain zero mean and unit variance
k
kik
ik
xx
x
σ

=

(2.4)
2.3 Definition of a Cluster
Now, let us define some basic concepts of clusters in a mathematical way. Let X be a set
of data, that is
X = {x
1
, x
2
,, x
n
}.
A so called m-clustering of X is its partition into m parts (clusters) C
1
,, C
m
, so that
1. None of the clusters is empty; C
i
≠ Ø
2. Every sample belongs to a cluster
3. Every sample belongs to a single cluster (crisp clustering); C
i
∩ C
j
= Ø, i≠j
Naturally, it is assumed that vectors in cluster C
i
are in some way more similar to each
other than to the vectors in other clusters. Figure 1 illustrates a couple of different kind
of clusters; compact, linear and circular.

Figure 1: Couple of different kind of clusters
2.4 Number of Possible Clusterings
Naturally, the best way to apply clustering would be to identify all possible clusterings
and select the best suitable one. Unfortunately, due to limited time and large amount of

4
feature vectors this isnt us ually possible. If we let S(N, m) denote the number of
possible clusterings of N vectors into m groups it is true that
1. S(N, 1) = S(N, N) = 1
2. S(N, m) = 0, if m > N
where the second statement comes from the definition that no cluster may be empty.
Actually, it can be shown that the solution for S is the Stirling numbers of the second
kind:

=

−=
m
i
Nim
i
i
m
m
mNS
0
)1(
!
1
),(
(2.5)
It is quite clear that the solution of equation 2.5 grows rapidly with N and if the number
of desired clusters m is not initially available many different values must be tested and a
raw power solution becomes impossible. A more efficient solution must be found.
2.5 Proximity Measure
When clustering is applied a way to measure the similarities and dissimilarities between
the samples is needed. A formal way to define the function of dissimilarity measure
(DM) d (informally, distance) on X is the following:
d : X × X → R
exists d
0
in R : -∞ < d
0
≤ d(x, y) < +∞, for all x, y in X (2.6)
d(x, x) = d
0
, for all x in X (2.7)
d(x, y) = d(y, x), for all x, y in X. (2.8)
If in addition the following is valid:
d(x, y) = d
0
, if and only if x = y (2.9)
d(x, z) ≤ d(x, y) + d(y, z), for all x, y, z in X (triangular inequality) (2.10)
then d is called a metric. A similarity measure (SM) is defined correspondingly, see
Theodoridis (1999) for details. For example the Euclidian distance d
2
is a metric
dissimilarity measure of d
0
= 0; the minimum possible distance between any two vectors
is 0 and is equal to 0 only when the two vectors are the same. Also the triangular
inequality is known to be true with Euclidian distance. Another well known metric DM
is the Manhattan norm.
The inner product x
T
y between two vectors on the other hand, is a similarity measure.
Especially, if the length of the vectors x, y is one then the lower and upper limits for the
inner product are -1 and +1.

5
When we extend the above formulas (2.6  2.10) to hold for subsets (U) of X we get the
proximity measure ς on U as a function
ς : U × U → R.
A typical case where proximity between subset s is needed is when a single vector x is
measured against a cluster C. Typically a distance to y, the representative of C, is
chosen. The representative can be chosen so that the value is, for example, maximized
or minimized. If a single vector representative is chosen among C the used method is
called global clustering criteria and if all the vectors in C have an effect on the
representative a local clustering criteria is being used.
The representative of C can also be curve or a hyperplane in the dimension of x. For
example in figure 2 different kind of repres entatives for the clusters of figure 1 are
chosen. The first cluster is represented by a point, the second by a line and the third by a
two hyperspheres (the inner and the outer).

Figure 2: Different representatives for different clusters
3 POPULAR CLUSTERING ALGORITHMS
As stated in chapter 2.4, calculating all possible combinations of the feature vectors is
not generally possible. Clustering algorithms provide means to make a sensible
division into small clusters by using only a fr action of the work needed to calculate all
possible combinations. These algorithms usuall y fall into one of the categories of the
below subchapters, Theodoridis (1999).
3.1 Sequential Algorithms
Sequential algorithms are straightforward and fast methods to produce a single
clustering. Usually the feature vectors are presented to the algorithm once or a few
times. Final result is typically dependent on the order of presentation and the result is
often compact and hyperellipsoidally shaped.
3.1.1 Basic Sequential Algorithmic Scheme
A very basic clustering algorithm that is easy to understand is basic sequential
algorithmic scheme (BSAS). In the basic form vectors are presented only once and the
number of clusters is not known a priori. What is needed is the dissimilarity measure
d(x, C), threshold of dissimilarity Θ and the number of maximum clusters allowed q.

6
The idea is to assign every newly presented vector to an existing cluster or create a new
cluster for this sample, depending on the distance to the already defined clusters. As
pseudo the algorithm works like the following:
1. m = 1; C
m
= {x1}; // Init first cluster = first sample
2. for every sample x from 2 to N
a. find cluster C
k
such that min d(x, C
k
)
b. if d(x, C
k
) > Θ AND (m < q)
i. m = m + 1; C
m
= {x} // Create a new cluster
c. else
i. C
k
= C
k
+ {x} // Add sample to the nearest cluster
ii. Update representative if needed
3. end algorithm
As can be seen the algorithm is simple but st ill quite efficient. Different choices for the
distance function lead to different results and unfortunately the order in which the
samples are presented can also have a great effect to the final result. Whats also very
important is a correct value for Θ. This value has a direct effect on the number of
formed clusters. If Θ is too small unnecessary clusters are created and if too large a
value is chosen less than required number of clusters are formed.
One detail is that if q is not defined the algorithm decides the number of clusters on its
own. This might be wanted under some circumstances but when dealing with limited
resources a limited q is usually chosen. Also, it should be noted that BSAS can be used
with a similarity function simply by replacing the min function with max.
There exists a modification to BSAS called modified BSAS (MBSAS), which runs
twice through the samples. It overcomes the drawback that a final cluster for a single
sample is decided before all the clusters have been created. The first phase of the
algorithm creates the clusters (just like 2b in BSAS) and assigns only a single sample to
each cluster. Then the second phase runs through the remaining samples and classifies
them to the created clusters (step 2c in BSAS).
3.1.2 A Two-Threshold Sequential Scheme
The major drawback of BSAS and MBSAS is the dependence on the order of the
samples as well as on the correct value of Θ. These drawbacks can be diminished by
using two threshold values Θ
1
and Θ
2
. Distances less than the first value Θ
1
denote that
these two samples most likely belong together and distances greater than Θ
2
denote that
these samples do not belong to the same cluster. Values between these two are in a so
called gray area and they are to be reevaluated at a later stage of the algorithm.

7
Letting clas(x) be a boolean stating whether a sample has been classified or not and
assuming no bounds to the number of clusters, the two-threshold sequential scheme
(TTSAS) can be described in pseudo:
1. m = 0
2. for all x clas(x) = false
3. prev_change = 0; cur_change = 0; exists_change = 0;
4. while exists some sample not classified
a. for every x from 1 to N
i. if clas(x) = false AND is first in this while loop AND
exists_change = 0
1. m = m + 1 // Create a new cluster
2. C
m
= {x}; clas(x) = true;
3. cur_change = cur_change + 1
ii. else if clas(x) = false
1. find min d(x, C
k
)
2. if d(x, C
k
) < Θ
1

a. C
k
= C
k
+ {x}; clas(x) = true; // Add to a cluster
b. cur_change = cur_change + 1
3. else if d(x, C
k
) > Θ
2

a. m = m + 1 // Create a new cluster
b. C
m
= {x}; clas(x) = true;
c. cur_change = cur_change + 1
iii. else // if clas(x) = true
1. cur_change = cur_change + 1
b. exists_change = |cur_change  prev_change|
c. prev_change = cur_change; cur_change = 0;

8
The variable exists_change variable checks if at least one vector has been classified at
the current pass of while loop. If no sample has been classified, the first unclassified
sample is used to form a new cluster (4.a.i) and this guarantees that at most N passes of
the while loop is performed. In practice the number of passes should naturally be much
less than N but in theory this is an O(N
2
) algorithm, see Weiss (1997) on additional
information about performance classifications.
3.2 Hierarchical Algorithms
Hierarchical algorithms are further divided into agglomerative and divisive schemes.
These algorithms rely on ideas of matrix and graph theory to produce either decreasing
or increasing number of clusters each ti me step thus producing a hierarchy of
clusterings.
The problem of its own is the choice of selecting a proper clustering from this hierarchy.
One solution is to track the lifetime of all cl usters and search for clusters that have a
large lifetime. This involves subjectivity and might not be suitable in many cases.
Another approach is to measure the  self-similarity of clusters by calculating distances
between vectors in the same cluster and comparing them to some threshold value. As
can be deduced the problem overall is difficult and no general correct solution exists.
3.2.1 Agglomerative Algorithms 1: Matrix Theory
Agglomerative algorithms start from an initial clustering where every sample vector has
its own cluster; that is, initially there exists N clusters with a single element in every one
of them. Every step of the algorithm two of these clusters are joined together, thus
resulting in one less clusters. This is continued until only a single cluster exists.
One simple agglomerative algorithm from whic h many other algorithms are derived is
the general agglomerative scheme (GAS) defined as follows:
1. t = 0; C
i
= x
i
, i = 1N // Initial clustering
2. while more than one cluster left
a. t = t + 1
b. among all the clusters find the pair min d(C
i
, C
j
) (or max d(C
i
, C
j
) if d
denotes similarity)
c. Create C
q
= C
i
+ C
j
, and replace clusters C
i
and C
j
with it.
It should be clear that this creates a hierarchy of N clusterings. The disadvantage here is
that once two vectors are assigned to a single cluster there is no way for them to get
separated at a later stage. Further, it is quite easy to see that this is an O(N
3
) algorithm
not suitable for a large N.

9
There are algorithms such as matrix updating algorithmic scheme (MUAS) and single
and complete link variations that can all be seen as a special case of GAS. These
algorithms are based on matrix theory and the general idea is to use the pattern matrix
D(X) and the dissimilarity matrix P(X) to hold the information needed in a GAS like
updating scheme. The pattern matr ix is a N × m matrix, whose ith row is the transposed
ith vector of X. The dissimilarity matrix is just an N × N matrix, whose element ( j, k)
equals the dissimilarity d(x
j
, x
k
).
3.2.2 Agglomerative Algorithms 2: Graph Theory
The other form of agglomerative algorithms is those based on graph theory. A graph G
is an ordered pair G = (V, E), where V = {v
i
, i = 1,, N} is a set of points (nodes) and E
is a set of edges denoted e
ij
or (v
i
, v
j
) connecting some of these points. If the order of
points (v
i
, v
j
) is not meaningful the graph is undir ected; otherwise it is directed. If no
cost is associated with the edges the graph is said to be unweighted and if anyone of the
edges has a cost then the graph is weighted. A more throughout introduction to graphs
can be found, for example, from Ilkka (1997).
Small graphs can be illustrated easily by drawing them like in the figure 3. The first
graph with 5 nodes is complete (all points are connected to each other) and unweighted.
The second graph is a subgraph of the first one and the third is a path 1,,5 with
weights assigned to each edge.

Figure 3: Different kinds of graphs.
In clustering algorithms we consider the graph nodes as sample vectors from X. Clusters
are formed by connecting these nodes together with edges. The basic algorithm in this
case is know as the graph theory-based algorithmic scheme (GTAS). It is, again, very
similar to GAS. The difference is in the step 2b, which now becomes
min g
h(k)
(C
i
, C
j
) (or max g
h(k)
(C
i
, C
j
) if g denotes similarity)
where g
h(k)
(C
i
, C
j
) is defined in terms of proximity and the property h(k) of the
subgraphs. This property can differ depending on the desired result. In other words
clusters (subgraphs) are formed based on the distance and the fact that the resulting
subgraph has the property h(k) or is complete.

10
3.2.3 Divisive algorithms
Divisive algorithms are the opposite of the agglomerative ones: they start with a single
cluster containing the entire set X and start to divide it in stages. The final clustering
contains N clusters, one sample in each one. The idea is to find the division that
maximizes the dissimilarity. As an example the generalized divisive scheme (GDS) may
be defined
1. t = 0; C
0
= {X}
2. while each vector not in single distinct cluster
a. t = t + 1
b. for i = 1 to t
i. amongst all possible pairs of clusters find max g(C
i
, C
j
)
c. form a new clustering by separating the pair (C
i
, C
j
)
It is easy to see that this is computationally very demanding and in practice
simplifications are required. One way to speed up the process goes as follows: Let C
i
be
a cluster to be split into C
1
and C
2
. Initially set C
1
= 0 and C
2
= C
i
. Now, find the vector
from C
2
whose average dissimilarity with other vectors is maximal and move it to C
1
.
Next, for each remaining vector x (in C
2
) compute its average dissimilarity with C
1
and
C
2
. If for every x dissimilarity with C
2
is smaller than with C
1
then stop (we found the
division). Else we move the x maximizing the similarity with C
1
and minimizing the
similarity with C
2
to C
1
. This process in continued always until a division has been
found and it can be viewed as the step 2.b.i of GDS.
3.3 Cost Function Optimization Algorithms
A third genre of clustering algorithms is those based on a cost function. Cost function
algorithms use a cost function J to define the sensibility of their solution. Usually the
number of desired clusters is known beforehand and differential calculus is used to
optimize the cost function iteratively. Bayesian, Thedoridis (1999), philosophy is also
often applied. This category includes also fuzzy clustering algorithms where a vector
belongs to a cluster up to a certain degree. As a glance to the cost function optimization
algorithms some of the theory of fuzzy clustering is discussed in the below subchapter.
3.3.1 Fuzzy Clustering
Fuzzy schemes have been under a lot of interest and research during the recent years.
What is characteristic and unique to fuzzy schemes is that a sample belongs
simultaneously to many categories. A fuzzy cl ustering is defined by a set of functions u
j

: X → A, j = 1,, m and A = [0, 1]. A hard clustering scheme can be defined by setting
A = {0, 1}.

11
Let Θ
j
be a parameterized representative of the cluster j, so that Θ = [Θ
1
T
,, Θ
m
T
]
T
and
let U be an N × m matrix with element (i, j) denoting u
j
(x
i
). Then we can define a cost
function of the form
∑∑
= =
=
N
i
m
j
ji
q
ijq
xduUJ
1 1
),(),( ΘΘ
(3.1)
which is to be minimized with respect to Θ and U. The parameter q is called the
fuzzyfier. The constraint is that one sample bel ongs to the clusters at the rate of 1

=
=
m
j
ij
u
1
1 , i = 1,,N. (3.2)
Minimizing J with respect to U, see Thedoridis (1999) for details, leads to

=

=
m
j
q
jr
sr
rs
d
d
u
1
1
1
,(
),(
1
Θx
Θx
, r = 1,...,N , s = 1,,m (3.3)
With respect to
Θ
we take the gradiant of (3.1) and obtain
0
),(
),(
1
=

=

=
N
i
j
ji
q
ij
j
xd
u
U
Θ
Θ
Θ
ΘJ
, j = 1,,m. (3.4)
When combined these two do not give a general closed form solution. One solution is to
use, for example, an algorithm known as generalized fuzzy algorithmic scheme (GFAS)
to iteratively estimate U and Θ, see Theodoridis (1999).
Finally if, for example, for a point representative we use a common function d of form
)()(),(
ji
T
jiji
Ad ΘxΘxΘx −−=
(3.5)
substituting this to (3.4) yields to
0)(2
),(
1
=−=

=
N
i
j
q
ij
j
Au
U
i

Θ
ΘJ

which is to be used in GFAS to obt ain new representatives per time step.
4

OTHER ALGORITHMS
There are also other algorithms not belonging to the groups mentioned in chapter 3.
These include, for example, genetic algorithms, stochastic relaxation methods,
competitive learning algorithms and morphological transformation techniques. Also,

12
some graph theory algorithms are used, for example, algorithms based on the minimum
spanning tree (MST), see Ilkka (1997). The following subchapter gives a quick tour to
the ideas of competitive learning.
4.1

Competitive Learning
Competitive learning algorithms are a wide branch of algorithms used in many fields of
science. What these algorithms actually do is clustering. They typically use a set of
representatives w
j
(like Θ in the previous chapter) which are moved around in space R
n

to match (represent) regions that include relatively large amount of samples. Every time
a new sample is introduced the representatives compete with each other and the winner
is updated (moved). Other representatives can be updated at a slower rate, left alone or
be punished (moved away from the sample).
One of the most basic competitive algorithms is the generalized competitive learning
scheme (GCLS) defined
1.

t = 0; // Time = 0
2.

m = minit; // Number of clusters
3.

while NOT convergence AND (t < tmax)
a.

t = t + 1
b.

present a new sample x and calculate winner w
j

c.

if (x NOT similar with w
j
) AND (m < mmax)
i.

m = m + 1; // New cluster
ii.

w
m
= x
d.

else // Update parameters
i.

if winner w
j
(t) = w
j
(t-1) + nh(x, w
j
(t-1))
ii.

if not winner w
j
(t) = w
j
(t-1) + nh( x, w
j
(t-1))
4.

Clusters are ready and represented by w
j
s. Assign each sample to the cluster
whose representative is the closest.
Parameters n and n called learning rate parameters. They control the rate of change of
the winners and losers. The function h is some function usually dependent on distance.
Convergence can be considered, for example, by calculating the total change in the
vectors w
j
and comparing it to a selected threshold value.

13
4.1.1

The Self-Organizing Map
Generalizing the definition of a representative and defining a neighborhood of
representatives Q
j
for each w
j
we achieve a model called the (Kohonen) self-organizing
map (SOM). As time t increases the neighborhood shrinks and concentrates around the
representative. All representatives in the neighborhood of the winner are updated each
time step. Whats important is the fact that the neighborhood is independent on the
actual distances in space and is defined in terms of the indices j. That is, the geometrical
distance in space is not an issue or metric of the neighborhood.
The self-organizing map and its properties are formally defined in terms of a neural
network in Haykin (1999). The original article about SOM by Kohonen (1982) is the
most popular and general model in use today. The Kohonen model is illustrated in the
figure 4. The input layer is connected to a two-dimensional array of neurons from which
the winner is chosen. The weight of the winner (and its neighborhood) is then updated
each time step like in GCLS.

Figure 4: The Kohonen SOM model
4.2

Closing on Algorithm Presentation
This and the preceding chapter introduced the most general algorithms used in
clustering. Every algorithm presented is of basic nature and should not be very hard to
understand or implement. The presentation is not complete and many algorithms of
value are omitted due to limitations set for the length of this document. As a close-up,
figure 5 gives an overall sight to the famili es of clustering algorithms mentioned above.
The hierarchy is neither absolute nor complete and also some other kind of division
could be made. It is provided to clarify the different branches in the way the algorithms
operate.

14

Figure 5: The family of clustering algorithms
5

THE IMPLEMENTATION
This chapter presents an implementation of four of the above algorithms. The
implementation works in the Microsoft Windows environment and is coded in the C++
programming language using the Microsoft Visual Studio .NET compiler. C++ was
chosen for this task because it is an industr y standard, it produces efficient programs and
because of its power of expression. All the re levant code is provided in appendix and is
available, along with the application, from the author.
What is needed before the actual implementation is a way to store and handle the
vectors used as containers for the features. For this a class of vector container was
created. What is assumed from here on is the fact that there are no more that three
features to deal with. This limitation is because of the fact that only three dimensions
(3D) can be easily projected to a 2D display. The vector class ( Vector3) has these three
components for which all the normal vector arithmetic is implemented as class
operators. The next chapter provides an efficient way to display these vectors and their
clusterings with the OpenGL API (application program interface) often used in
professional 3D graphics.
A base class for all clustering algorithms was named CClustAlgorithm. It provides some
virtual functions all algorithms must implement. This way there is a unified base from
which all the algorithms must be derived. The most important function (Java has
methods, C++ has functions) to implement is
virtual ClList* Clusterize(const VList* vectors, CCluster* empty)
const;
It returns an std::list of clusters formed from given vectors. The empty cluster
parameter is so that different kind of subclasses can be used: Clusterize always creates

15
the same kind of objects as is given to it. That way one can, for example, use clusters
with different kind of representatives.
The CCluster class represents a single cluster. It holds a list of all the vectors belonging
to it and a representative (mean value). It can be used as a base class if other kinds of
representatives are needed. Vectors can be added and removed and distances to other
clusters and individual vectors can be calcul ated. Also, other clusters can be included
into a cluster and thus form a union. This functionality is enough for efficient
implementation of the algorithms in the below subchapters.
5.1

MBSAS
The implementation of the MBSAS algorithm is straightforward. Initialization includes
creating an initial cluster and adding the first element in the list to it. After that, the
create clusters pass is performed. One iterator goes through all the samples and
another iterator goes through the list of clus ters (for every sample). The cluster with
minimal distance is retrieved and the distance is compared to the given threshold value.
If the distance is greater than the threshold and we still may create a new cluster ( iq is
the maximum amount) we do so. As code the create clusters pass is
for (; iter != tmplist.end(); iter++) {
tmp = *iter;
float mindist = FLT_MAX;

// Find the minimum distance
for (iter2 = ClusterList->begin(); iter2 != ClusterList->end();
iter2++) {
float dist = (*iter2)->Distance(tmp);
if (dist < mindist)
mindist = dist;
}
// Create a new cluster?
if ((mindist > fTheta) && ((int)ClusterList->size() < iq)) {
CCluster* newclust = empty->GetNewCluster();
newclust->AddVector(tmp);
ClusterList->push_back(newclust);
}
}

Now, all the samples already assigned to a cluster are removed from the list. After that a
second pass to classify all the rest is made. The sample is added to the cluster with
minimum distance just like above but new clusters are not created anymore.
The implementation works well and is the fastest (among with TTSAS) of the
algorithms provided here. The quality of the result is dependent on the correct value of
the threshold value as can be seen in chapter 7.
5.2

TTSAS
In the TTSAS algorithm the list of vectors is first transformed into a normal array of
vectors. This might not be necessary but now another array containing the information
about whether the sample is classified or not ( clas) is easy to make. The implementation

16
here does not limit the number of clusters like the above MBSAS. What is done is
exactly the same as in chapter 3.1.2. Part 4.a.i goes like
if (!clas[i] && existsChange == 0 && !gotOne) {
// Let's make sure the while ends at some point :)
CCluster* clust = empty->GetNewCluster();
clust->AddVector(tmplist[i]);
ClusterList->push_back(clust);
clas[i] = true;
curChange++; numDone++; gotOne = true;
}

Next, if sample is not classified (4.a.ii) we search the minimum distance cluster and
based on the two threshold values create a new cluster, add the sample to an existing
cluster or just leave it to a later pass:
if (mindist < fTheta1) { // found the same kind
minclust->AddVector(tmplist[i]);
clas[i] = true;
curChange++; numDone++;
}
else if (mindist > fTheta2) { // need to create a new one
CCluster* clust = empty->GetNewCluster();
clust->AddVector(tmplist[i]);
ClusterList->push_back(clust);
clas[i] = true;
curChange++; numDone++;
}

All this is done for the entire list, pass by pass, until every sample belongs to a cluster.
The first if guarantees that no more that N
2
is made. Naturally, in practice, the amount of
passes is less. The result is, again, dependent on the threshold values. The speed of this
implementation is on par with the MBSAS implementation or little better.
5.3

GAS
The GAS algorithm is also quite short and easy to implement. Notice that the entire
hierarchy of clusterings is not saved in this implementation. First, create an initial
clustering of N clusters with one sample in every cluster. Then, while there is more that
the suggested amount of clusters seek the two that are closest to each other and combine
them:
// Seek the two clusters that have min distance (slow)...
for (iter2 = ClusterList->begin(); iter2 != ClusterList->end();
iter2++) {
iter2++; // Dummy inc
for (iter3 = iter2--; iter3 != ClusterList->end(); iter3++) {
float dist = (*iter2)->Distance(*iter3);
if (dist < mindist) {
mindist = dist; minclust1 = *iter2; minclust2 = *iter3;
}
}
}
// ...and combine them
if (minclust2 != NULL) {
minclust1->Include(*minclust2);
ClusterList->remove(minclust2);

17
delete minclust2;
}

The only problem here is actually the slowness of the algorithm. It takes a lot of time to
go through all the levels of clustering. This proves the theory that this algorithm is quite
slow when compared to the two above algorithms. On the other hand there are no
threshold values that would need to be carefully selected as can be seen in chapter 7 and
thus this algorithm is easier and safer to use. If the amount of clusters is relatively
large compared to the amount of samples this algorithm should be more competitive
since then less iteration steps are performed.
5.4

GDS
The general divisive scheme is even more demanding than GAS. That is why the
general version of the algorithm was not implemented but an optimized version is
presented (a bit like the version in 3.2.3). The optimization is based on the assumption
that an outlier sample (the one farthest away from the medium) is very likely to belong
to another cluster than where it currently is. Also as with GAS this algorithm is not
driven to the end, but it is interrupted as soon as the suggested amount of clusters is
found. This actually helps a lot if there is a relatively small amount of clusters.
Initially a single cluster including all samples is created. Then, while there exists less
than desired amount of clusters, a cluster with the farthest outlier is selected as the one
to be divided in two. The outlier is moved to a cluster of its own. Then the distance to
the new cluster is calculated for every other vector in the old cluster and measured
against the distance to the own representati ve. The one closest to the new cluster is
selected and if it is nearer to the new cluster than the old one it is moved to the new one.
This is continued until no vector is moved dur ing one pass (all vectors in the old cluster
are nearer to representative than the new one):
while (foundOne) {
foundOne = false; // Let's see if we find any
fVector3 vect(0, 0, 0);
maxdist = FLT_MAX;
// Go through all the samples in the old cluster
for (iter = maxclust->GetVectors().begin();
iter != maxclust->GetVectors().end(); iter++) {
// Dist. to the new clust.
maxdist2 = newclust->Distance(*iter);
if (maxclust->Distance(*iter) > maxdist2 && maxdist2 < maxdist) {
foundOne = true;
maxdist = maxdist2;
vect = *iter;
}
}
if (foundOne) { // We did find one sample?
newclust->AddVector(vect);
maxclust->RemoveVector(vect);
}
}
The improvements presented significantly diminish the time needed for a single pass:
Instead of always looking for all possible combinations we select a single vector and

18
attach similar vectors to it from the old cluster. Due to the optimizations, the runtime
for this algorithm is closer to the MBSAS and TTSAS than it is to GAS. The quality of
the result is generally close to the one of GAS.
6

THE VISUALIZATION ENGINE
The visualization of clusterings is problematic if the amount of features is large. What is
provided in this chapter is a way to display three element vectors and their clusterings
with the OpenGL 1.2 API. This engine can also be seen as a tutorial to the use of
OpenGL in Microsoft Windows environment to create 3D visualizations. It actually
consists of a single class CGLRenderer which acts as a wrapper between OpenGL and
the rest of the program. The user, or programmer, needs no knowledge of the OpenGL
while using other parts of the code.
What this class provides are the basic functions needed to initialize OpenGL, draw
elements to screen, move and rotate the camera and display text to the screen. The most
important functions of CGLRenderer are a function to draw a list of clusters
void DrawClusters(const ClList* list);

and the two functions that move and rotate the view point (camera) in the coordinate
system allowing the viewer to move freely in space
void MoveCamera(float advance, float sideways, float up = 0.f);
void RotateCamera(float xrot, float yrot);
If selected, the engine draws gray bounding spheres around the clusters so that it is
easier to see the limits of a single cluster (key  V). Figure 6 shows a typical view of
five separate clusters in space. The gray spheres represent the borders of different
clusters, while the colorful dots inside the spheres are individual samples. The three
white lines represent the coordinate axis. User can change the viewpoint and view
direction with mouse and mouse buttons. The user may choose any viewpoint and any
view direction; this is not limited by the application.
The user can change the algorithm to show by pressing the number keys. If the selected
algorithm is disabled no clusters are shown (j ust the coordinate axis). The text display
on the bottom left corner of the screen shows the current algorithm, number of found
clusters and viewer position (the display is not visible in the resized images of this
document). If the user minimizes the application it goes to a so called idle mode where
it consumes much less processor time al lowing other applications to run more
efficiently.

19

Figure 6: Typical view of five clusters
6.1

Data Generation and Algorithm Initialization
Data can be generated with a simple generator class CDataGenerator. It has static
functions that can be called from anywhere to get data generated with a simple random
function. The data generation setup screen is illustrated in the figure 7. User inputs the
number of clusters and vectors and the data generator generates such a material. The
compactness is a value indicating how compact the clusters should be. The file section
is for future use when data could be read from a file generated, for example, with
Matlab. User should notice here that after this dialog also the algorithms are reinitialized
with given amount of clusters. If other parameters are wanted the algorithm setup dialog
in figure 8 must be used.
After either of these dialogs the algorithms are driven through with the data. If the
number of sample vectors is great the response to the user might be slow. This is due to
the fact that the application is single threaded, meaning that when some algorithm is
been run the display is not updated.
Algorithms and their parameters are initialized with another dialog illustrated in the
figure 8. From this dialog the GAS algorithm can also be disabled if a large amount of
data needs to be clusterized. When all the parameters have been filled the user presses
OK and the data is reclusterized using the new parameters for the algorithms.

20

Figure 7: The data generation setup.

Figure 8: The algorithm setup.
7

THE TESTS
This chapter presents some results of running the algorithms with different parameters
and different amounts of samples and clusters. No solid statistical analysis was made for
the results. A more throughout analysis is beyond the scope of this basic document and
the results are provided as a guideline for general interest on the behavior of the
algorithms.
First the speed of the algorithms was analyzed against the amount of samples. Table 1
shows the result (the amount of clusters was kept in 20). The tests were run on AMD

21
Athlon Classic 700MHz, 512kB of L2-cache and with 512MB of memory on Windows
XP. The GAS algorithm was not driven with the three largest data sets due to the fact
that it would have taken a lot of time and the behavior ( O(N
3
)) of the algorithm can
already be seen from the smaller sets. The performance of the other algorithms seems to
be proportional to N
2
, where one pass of the GDS is relatively long compared to
MBSAS or TTSAS. Notice that due to the way the runtime was measured times below
100ms cannot be considered accurate.
Table 1. Runtime for different amounts of data

Algorithm N=1000 N=2000 N=4000 N=8000 N=16000 N=32000
MBSAS 10ms 10ms 20ms 70ms 310ms 960ms
TTSAS 0ms 0ms 20ms 60ms 320ms 900ms
GDS 80ms 360ms 1.5s 7.1s 37.4s 231.6s
GAS 12.4 s 185.1s 0.4h   

7.1

The Effect of Parameters
The effect of the parameters was also examined with randomly selected sets of data.
First figure 9 shows the result of incorrectly chosen parameters for TTSAS.

Figure 9: The meaning of correct parameters.

The figure on the left shows how TTSAS with incorrectly large theta values (1000,
2000) misclassifies a part of the blue cluster to the red cluster (cluster on the left). When
the theta values are lowered to a more reasonable stage (500, 1500) TTSAS creates
correct clusters in the picture on the ri ght. The value 500 corresponds roughly the

22
distance of 20 in space. This problem does not emerge with GAS or GDS since they
have no dissimilarity parameters.
Figure 10 illustrates the case where an incorrect amount of clusters was guessed and
given to the algorithms as parameter (four instead of five).

Figure 10: The meaning of correct value of clusters
The top leftmost image shows the behavior of MBSAS: the blue cluster (on the left) is
incorrectly large when compared to the top rightmost image of TTSAS. The result of
TTSAS is most likely the correct one since it does not use the number of clusters
parameter at all. The behavior of GAS, down on the left, is similar to MBSAS while
GDS creates a big red cluster (on the right) including the samples of the green cluster.
The fact that GDS differs from GAS is probabl y due to the fact that the red cluster of
GDS is more compact than the blue cluster of GAS thus making its outlier closer to
average value for the GDS algorithm.
8

CONCLUSION
What were discussed were the problem of clustering and the most popular algorithms of
that particular field of science. The basic concepts needed to understand the
functionality of the algorithms were discus sed in chapter two. Chapter three provided a
sight to the most popular algorithms and their behavior. Chapter four completed the tour
with a couple of special purpose algorithms. Chapter five included an implementation of
four of the algorithms discussed and chapter six presented a way to visualize the product

23
of the algorithms with OpenGL. Finally, in chapter 7 the algorithms were run on the
framework and some different kind of parameters and samples were considered. This
paper was meant as an introduction to the algorithms and also the purpose was to create
an efficient way to display the clusterings of individual data elements.
If something remains to be done a more throughout analysis of the behavior of the
algorithms should be made. Especially cases where the samples are badly balanced or in
some particular order should be generated and analyzed. The application could easily be
complemented with a screenshot feature to automatically generate printable figures of
the current display. Also, it would be easy to add a feature to read the sample data from
a file.
Finally, a complete set of the source code is available from the author by request. The
appendix provides all commented key elements of the code. The code is provided for
educational purposes. A complete listing would be unacceptably long to be included to
this document and would serve no purpose.
REFERENCES
Haykin, S. 1999. Neural Networks a Comprehensive Foundation. Second Edition.
Prentice Hall. 842 p.
Ilkka, S. 1997. Diskreettiä Matematiikkaa. Fifth Edition. Otatieto. 165 p.
Kohonen, T. 1982. Self-Organized Formation of Topologically Correct Feature Maps.
Biological Cybernetics. vol 43. pp. 59-69.
Kreyszig, E. 1993. Advaced Engineering Mathematics. Seventh Edition. Wiley. 1271 p.
Thedoridis, A; Koutroumbas, K. 1999. Patte rn Recognition. Academic Press. 625 p.
Weiss, M. 1997. Data Structures and Algori thm Analysis in C. Second Edition. Addison
Wesley. 511 p.
APPENDIX
/**
* Cluster.h
*
* Provides a basic cluster class including sample vectors.
*
* Commercial use without written permission from the author is forbidden.
* Other use is allowed provided that this original notice is included
* in any form of distribution.
* @author Jukka Kainulainen 2002 jkainula@cc.hut.fi

*/

/**
* The definition of a single cluster. A cluster has a list of vectors
* belonging to it. Vectors can be added or removed to/from a cluster.
* Other classes can be derived from this class. For example those with
* other kind of representatives. This class uses a medium representative.
*
*/
class CCluster
{
protected:
/** The vectors belonging to this cluster */
VList Vectors;
/** The medium representative */
fVector3 Medium;
/** The current outlier sample */

24
fVector3 Outlier;
/** The distance of the outlier from representative */
float fOutlierDist;
/** Is the outlier valid */
bool bOutlierValid;

/** Updates the representative value */
virtual void UpdateMedium();
/** Updates the outlier sample (the one farthest from the representative) */
virtual void UpdateOutlier();
public:
CCluster(void);

/** Is this cluster empty? */
bool IsEmpty() const;

/** Add a vector to this cluster */
virtual void AddVector(const fVector3& vec);

/** Remove a vector from this cluster */
virtual void RemoveVector(const fVector3& vec);

/** Includes all vectors from the given cluster to this one also. */
virtual void Include(CCluster& cluster);

/** Returns a refenrence to the list of vectors */
virtual const VList& GetVectors() const;

/** Returns the reprsentative vector */
virtual fVector3 GetRepresentative() const
{ return Medium; }

/** Returns the outlier vector */
virtual fVector3 GetOutlier();

/** Returns the outlier distance */
virtual float GetOutlierDist();

/** Returns a new cluster object. Override for different clusters. */
virtual CCluster* GetNewCluster()
{ return new CCluster(); }

/** Returns the squared distance to given vector. */
virtual float Distance(const fVector3& vec);

/** Returns the squared distance to another cluster */
virtual float Distance(const CCluster* clust);

/** Clears and deletes this cluster */
virtual ~CCluster(void);
};

/** A list of cluster pointers */
typedef list<CCluster*> ClList;

// From Cluster.cpp
void CCluster::Include(CCluster& cluster)
{
VList::iterator i;

for (i = cluster.Vectors.begin(); i != cluster.Vectors.end(); ++i) {
Vectors.push_back(*i);
}
UpdateMedium();
bOutlierValid = false;
// The caller should remove the elements from the other one...
}

const VList& CCluster::GetVectors() const
{
return Vectors;
}

float CCluster::Distance(const fVector3& vec)
{
return Medium.SquaredDistance(vec);
}

float CCluster::Distance(const CCluster* clust)
{
return Medium.SquaredDistance(clust->GetRepresentative());
}

25
/**
* ClustAlgorithm.h
*
* Commercial use without written permission from the author is forbidden.
* Other use is allowed provided that this original notice is included
* in any form of distribution.
* @author Jukka Kainulainen 2002 jkainula@cc.hut.fi
*/
#pragma once

#include "Cluster.h"

/**
* Provides a base class for all clustering algorithms.
*/
class CClustAlgorithm
{
public:
CClustAlgorithm() {};

virtual void SetParameters(float theta, int q = 0) = 0;

/**
* Creates a clustering from vectors using the given empty cluster class.
* You can give different kinds of cluster subclasses as parameter.
*
*/
virtual ClList* Clusterize(const VList* vectors, CCluster* empty) const = 0;

/** Returns the number of clusters searched for, 0 if not in use */
virtual int GetClusters()
{ return 0; }

/** Returns the theta parameter used, 0 if not in use */
virtual float GetTheta()
{ return 0.f; }
virtual float GetTheta2()
{ return 0.f; }

/** Returns the name of the algorithm */
virtual const char* GetName() const = 0;

virtual ~CClustAlgorithm(void) {};
};

// From MBSAS.cpp
ClList* CMBSAS::Clusterize(const VList* vectors, CCluster* empty) const
{
int numVectors = 0;
int clusters = 0;
fVector3 tmp;

if ((vectors == NULL) || (empty == NULL))
return NULL;

if ((int)vectors->size() < 1)
return NULL;

// This is not the optimum way....
VList tmplist = *vectors;

ClList* ClusterList = new ClList();

VList::iterator iter;
ClList::iterator iter2;

// Fill with the initial vector
iter = tmplist.begin();
empty->AddVector(*iter);
iter++;
ClusterList->push_back(empty);

// 'Create the clusters' pass
for (; iter != tmplist.end(); iter++) {
tmp = *iter;
float mindist = FLT_MAX;

// Find the minimum distance
for (iter2 = ClusterList->begin(); iter2 != ClusterList->end(); iter2++) {
float dist = (*iter2)->Distance(tmp);
if (dist < mindist)
mindist = dist;

26
}
// Create a new cluster?
if ((mindist > fTheta) && ((int)ClusterList->size() < iq)) {
CCluster* newclust = empty->GetNewCluster();
newclust->AddVector(tmp);
ClusterList->push_back(newclust);
}
}
// Now we have to remove the already taken samples...
for (iter2 = ClusterList->begin(); iter2 != ClusterList->end(); iter2++) {
tmp = (*iter2)->GetRepresentative(); // Representative is the only one
tmplist.remove(tmp);
}

// And then we classify the rest...
for (iter = tmplist.begin(); iter != tmplist.end(); iter++) {
tmp = *iter;
float mindist = FLT_MAX;
CCluster* minclust = NULL;

// Find the minimum distance cluster...
for (iter2 = ClusterList->begin(); iter2 != ClusterList->end(); iter2++) {
float dist = (*iter2)->Distance(tmp);
if (dist < mindist) {
mindist = dist;
minclust = *iter2;
}
}
minclust->AddVector(tmp); // ...and add to it
}

return ClusterList;
}

// From TTSAS.cpp...
ClList* CTTSAS::Clusterize(const VList* vectors, CCluster* empty) const
{

if ((vectors == NULL) || (empty == NULL))
return NULL;
if ((int)vectors->size() < 1)
return NULL;

// We'll do this the old way...
bool* clas = new bool[vectors->size()];
fVector3* tmplist = new fVector3[vectors->size()];
VList::const_iterator iter;
int i = 0;

for (iter = vectors->begin(); iter != vectors->end(); iter++, i++) {
tmplist[i] = *iter;
clas[i] = false;
}

ClList* ClusterList = new ClList();

int numVectors = (int)vectors->size();
int numDone = 0; // Number of classified samples
int existsChange = 0; // Classified something new during last pass
int curChange = 0; // Current number of classified samples
int prevChange = 0; // Number of classified samples during last pass
float mindist = FLT_MAX;
CCluster* minclust = NULL;
ClList::iterator iter2;

while (numDone < numVectors) {
bool gotOne = false;

for (i = 0; i < numVectors; i++) {
if (!clas[i] && existsChange == 0 && !gotOne) {
// Let's make sure the while ends at some point :)
CCluster* clust = empty->GetNewCluster();
clust->AddVector(tmplist[i]);
ClusterList->push_back(clust);
clas[i] = true;
curChange++; numDone++; gotOne = true;
}
else if (clas[i] == 0) {
mindist = FLT_MAX;
minclust = NULL;
// Find the minimum distance cluster...
for (iter2 = ClusterList->begin(); iter2 != ClusterList->end(); iter2++) {

27
float dist = (*iter2)->Distance(tmplist[i]);
if (dist < mindist) {
mindist = dist;
minclust = *iter2;
}
}
if (mindist < fTheta1) { // found the same kind
minclust->AddVector(tmplist[i]);
clas[i] = true;
curChange++; numDone++;
}
else if (mindist > fTheta2) { // need to create a new one
CCluster* clust = empty->GetNewCluster();
clust->AddVector(tmplist[i]);
ClusterList->push_back(clust);
clas[i] = true;
curChange++; numDone++;
}
}
else // clas == 1
curChange++;
}
existsChange = abs(curChange - prevChange);
prevChange = curChange; curChange = 0;
}

delete empty;
delete[] clas;
delete[] tmplist;

return ClusterList;
}

// From GDS.cpp...
ClList* CGDS::Clusterize(const VList* vectors, CCluster* empty) const
{
if ((vectors == NULL) || (empty == NULL))
return NULL;
if ((int)vectors->size() < 1)
return NULL;

// Create the initial clustering...
ClList* ClusterList = new ClList();
VList::const_iterator iter;
CCluster* tmp = empty->GetNewCluster();
for (iter = vectors->begin(); iter != vectors->end(); iter++)
tmp->AddVector(*iter);
ClusterList->push_back(tmp);

float maxdist, maxdist2;
CCluster* maxclust;
ClList::iterator iter2;
while ((int)ClusterList->size() < iq) {
maxdist = 0;
maxclust = NULL;
// Find the cluster that has maximal outlier element...
for (iter2 = ClusterList->begin(); iter2 != ClusterList->end(); iter2++)
if ((*iter2)->GetOutlierDist() > maxdist) {
maxdist = (*iter2)->GetOutlierDist();
maxclust = *iter2;
}
// Move the outlier to a new cluster
CCluster* newclust = empty->GetNewCluster();
newclust->AddVector(maxclust->GetOutlier());
maxclust->RemoveVector(maxclust->GetOutlier());
ClusterList->push_back(newclust);

bool foundOne = true;
// While we found a vector more similar to the new cluster...
while (foundOne) {
foundOne = false; // Let's see if we find any
fVector3 vect(0, 0, 0);
maxdist = FLT_MAX;
// Go through all the samples in the old cluster
for (iter = maxclust->GetVectors().begin(); iter != maxclust-
>GetVectors().end(); iter++) {
maxdist2 = newclust->Distance(*iter); // Dist. to the new clust.
if (maxclust->Distance(*iter) > maxdist2 && maxdist2 < maxdist) {
foundOne = true;
maxdist = maxdist2; // The closest one to the new cluster
vect = *iter;
}

28
}
if (foundOne) { // We did find one sample?
newclust->AddVector(vect);
maxclust->RemoveVector(vect);
}
}
}

delete empty;

return ClusterList;
}

// From GAS.cpp...
ClList* CGAS::Clusterize(const VList* vectors, CCluster* empty) const
{
if ((vectors == NULL) || (empty == NULL))
return NULL;
if ((int)vectors->size() < 1)
return NULL;

// Create the initial clustering...
ClList* ClusterList = new ClList();
VList::const_iterator iter;
for (iter = vectors->begin(); iter != vectors->end(); iter++) {
CCluster* tmp = empty->GetNewCluster();
tmp->AddVector(*iter);
ClusterList->push_back(tmp);
}

ClList::iterator iter2;
ClList::iterator iter3;
float mindist;
CCluster* minclust1;
CCluster* minclust2;
while ((int)ClusterList->size() > iq) {
mindist = FLT_MAX;
minclust1 = NULL;
minclust2 = NULL;
// Seek the two clusters that have min distance (slow)...
for (iter2 = ClusterList->begin(); iter2 != ClusterList->end(); iter2++) {
iter2++; // Dummy inc
for (iter3 = iter2--; iter3 != ClusterList->end(); iter3++) {
float dist = (*iter2)->Distance(*iter3);
if (dist < mindist) {
mindist = dist; minclust1 = *iter2; minclust2 = *iter3;
}
}
}
// ...and combine them
if (minclust2 != NULL) {
minclust1->Include(*minclust2);
ClusterList->remove(minclust2);
delete minclust2;
}
}

delete empty;

return ClusterList;
}

// From GLRenderer.cpp
void CGLRenderer::DrawVolumes(const ClList* list)
{
int c = 0;

glColor4f(0.5f, 0.5f, 0.5f, 0.4f);
glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
for (ClList::const_iterator iter = list->begin(); iter != list->end(); iter++, c+=3){
glPushMatrix();
float dist = (*iter)->GetOutlierDist();
fVector3 out = (*iter)->GetRepresentative();
glTranslatef(out.x, out.y, out.z);
glutWireSphere(sqrt(dist), 8, 8);
glutSolidSphere(sqrt(dist), 8, 8);
glPopMatrix();
}
glDisable(GL_BLEND);
}

29
void CGLRenderer::RotateCamera(float xrot, float yrot)
{
// Around the up vector...
vdir = vdir.Rotate(vup, 3.1415f * yrot / 180.f);
vleft = vleft.Rotate(vup, 3.1415f * yrot / 180.f);
vup = vdir.Cross(vleft); // Just to make sure we don't get messed up

// Around the "left" vector...
vup = vup.Rotate(vleft, 3.1415f * xrot / 180.f);
vdir = vdir.Rotate(vleft, 3.1415f * xrot / 180.f);
vleft = vup.Cross(vdir);

vdir.Normalize();
vleft.Normalize();
vup.Normalize();

}