Formalization of Data Stream Clustering Properties and Analysis of Algorithms

muttchessAI and Robotics

Nov 8, 2013 (4 years and 8 months ago)


Formalization of Data Stream Clustering Properties and Analysis of
Marcelo Keese Albertini e Rodrigo Fernandes de Mello
Department of Computer Sciences
Institute of Mathematics and Computer Sciences
University of Sao Paulo
Av.Trabalhador S˜aocarlense 400,S˜ao Carlos - SP,Brazil
Abstract—The understanding of several phenom-
ena requires unbounded data collections,called data
streams.These phenomena often present unstable
behavior and are studied by means of unsupervised
induction processes based on data clustering.Cur-
rently,clustering processes have shown serious lim-
itations in their applications to data streams due
to the demands imposed by behavioral changes and
unlimited data collection.However,despite the key
distinctions in between traditional data sets,which
are finite and unordered,and data streams,which are
essentially infinite sequences,studies have overlooked
the dynamic and transient nature of streams,lim-
iting the appropriate understanding of phenomena.
The lack of a theoretical analysis for the problem
of data streams clustering led us to propose,in this
paper,a formalization based on Set Theory.This
formalization made it possible to identify and propose
basic properties for the design and comparison of
data stream clustering algorithms.It is expected to
be a starting point to understand the foundations
of unsupervised induction based on clustering and,
mainly,the modeling of phenomena.
Keywords:Artificial Intelligence;Machine Learning;
Data Mining;Unsupervised Learning;Data Clustering;
Data Streams;Set Theory
The literature from several research fields has de-
scribed two main types of phenomena that produce
endless sequences of data also referred to as data streams
[11],[12].The first type is characterized by the need for
data storage space and fast computation.In this situa-
tion,data is stored in secondary memory,which presents
low transfer rates,and usually accessed in a contiguous
manner.The second type is even more computationally
demanding:data is collected at high rates and shortly
afterwards is disposed.In this type,clustering models
must be continuously obtained throughout the endless
data-gathering process,whose dynamical properties,i.e.,
behavior,are expected to evolve over time [7],[12].
Data streams are frequently found in computationally
intensive environments,such as climate and weather
analysis [2],text mining [10],[9],genomic analysis,and
advanced scientific experiments [18],[19].
The typical clustering process has been designed to ap-
proach finite and unordered data sets and,consequently,
does not meet data streams requirements [20].There
exist great distinctions between clustering requirements
for data sets and data streams.Firstly,a data stream
must be accessed and processed sequentially as it cannot
be completely stored in memory.Secondly,the open-
endedness nature of data streams demands continuous
and automatic analysis.Thirdly,while stable phenomena
can be accurately represented by bounded data sets
and are suitable to traditional data clustering,data
streams,on the other hand,usually represent unstable
phenomena [7],whose characteristics tend to change
over the collection.This tendency indicates the transient
nature of data streams and demands a continuous re-
evaluation of clustering models.Finally,the role of do-
main experts when clustering data streams is different
from clustering data sets.In the latter,specialists are
often required to empirically extract,select and analyze
data features in order to define the clustering algorithm
and the validation criterion.Conversely,in data streams,
the fast and continuous production of large amounts of
data restricts human intervention,due to the limited
capability of specialists to make well-founded decisions
under such constraints.
In order to better understand data set clustering,
Kleinberg [8] formalized three properties,which allow
the analysis of algorithm capabilities independently of
the target application.The formalization of the data-set
clustering problem is an important step towards its deep
understanding.Recently,several studies concerning the
usage and proposition of properties to related problems
have been developed [5],[4],[22].These studies have
broadened the possibilities to analyze clustering algo-
rithms,although very few of them have been conducted
in the context of data streams.
The lack of formal studies on the problem motived
our formalization of data stream clustering as an ex-
tension of Kleinberg’s approach to data sets [8].In
such a definition,data stream clustering is described in
terms of an infinite sequence of partitions,which are
modified along time.The endless and changing nature
of data streams requires properties different than those
proposed by Kleinberg for finite and unordered data sets.
Data streams,as infinite and ordered sequences,need
clustering properties representative of the time evolution
and behavior changing.
We have extended Kleinberg’s properties to repre-
sent clustering partitions evolving according to the data
stream behavior.In addition,we introduce the Coher-
ence property,which states that partitions must conti-
nously evolve over time in order to preserve the meaning
of the clustering process.However,we observed that this
property is incompatible with the Kleinberg’s Richness,
which states that a data-set clustering function must
be capable of generating any partition.This conflict
indicates that trade-off analyses are needed in order
to demand properties from algorithms.Additionally,we
noticed that it is difficult to find an algorithm to comply
with Richness in a data stream context.
The remainder of this paper is organized as follows:
Section II introduces studies on data set clustering;
Section III describes our formalization approach to data
streams;Section IV draws conclusions and present ideas
for future work;and,finally,references are listed.
II.Related Work
Kleinberg [8] considered concepts of Set Theory [14]
to describe clustering algorithms in general terms,i.e.,
without relying on a specific algorithm,objective func-
tion for optimization,or statistical model.This analysis
is important to understand the functioning of algorithms
beforehand,i.e.,without the need for experimental trials,
and also to provide design guidelines for new approaches.
The concepts involved in Kleinberg’s formalization are
illustrated in Figure 1.This formalization assumes data
elements are organized in set E,whose measurements
of dissimilarity in between elements are provided in
a square matrix.This matrix,d,is described in the
form of a function d:E × E → R
,and is applied
to two elements {i,j} ∈ E to obtain a dissimilarity
value d(i,j) in [0,∞).The construction of d determines
the possibility of a clustering algorithm to organize the
elements in E.A cluster is essentially an organization
of elements of E into subsets,so that each element is
contained in one,and only one,subset.This structure is
known as the partition of a set.Partitions are denoted
by uppercase Greek letters,such as Γ (Gamma) and Υ
(Upsilon).Aclustering algorithmis,therefore,a function
f(d),which considers a distance matrix d for mapping
elements in E into a partition belonging to the universe
of partitions U.
E d:E ×E →R
d(♠,♣) = 9
f:d →U
f(d) = Γ
Figure 1.Illustration of concepts of data set clustering
By using this formalization,Kleinberg proposed prop-
erties to data set clustering introduced in the form of
statements of principles or self-evident conditions to a
well-succeeded clustering algorithm.The author initially
proposed three properties:Scale-Invariance,Richness
and Consistency.
Scale-Invariance refers to the ability of algorithms to
abstract the measurement scale of elements,i.e.,for
any distance function d and constant α > 0,we have
f(d) = f(α ∙ d).This property reflects the expectation
that the magnitude of the scale of elements should not
change the partition;for example,adapting elements
from centimeters to inches should not modify the parti-
tions obtained.
Richness refers to the ability of an algorithm to in-
duce all possible partitions for a set of elements.When
Richness holds,for any partition Γ of E,there exists a
distance matrix d for which f(d) results in Γ.It applies
the concept of surjection of Set Theory [14],which
defines the image of the surjective function as being
equal to its counter-domain.This property assumes it is
possible to arrange elements by modifying their relative
distances in order to obtain all partitions.A clustering
algorithm complies with Richness if and only if it
permits obtaining all partitions for the elements in E,
otherwise some partitions are impossible to be found.
Ensuring the possibility of finding all partitions for a
data set,as required by Richness,appears,at first,de-
sirable for an algorithm whose data nature is unknown.
However,this property may be difficult to comply with
Consistency refers to the ability algorithms have to
maintain the same partition when distances among ele-
ments within the same group are reduced while distances
among elements of different groups are increased.For
example,consider a partition Γ generated by a clustering
function f,according to a distance function defined by d.
Also consider changing d into d

in a way that it increases
the distances among elements of distinct subsets of Γ
and reduces the distances among elements of the same
subset.A consistent clustering function f(d

) always
generates a partition Γ

equivalent to Γ,in the sense
that f(d

) = f(d).
The previously described Kleinberg’s properties have
proven too demanding for a clustering function to com-
ply with,resulting in the incompatibility among them.
Kleinberg [8] proved,in an impossibility theorem,that
only two out of these three properties could be satisfied
by any algorithm.Although the practical consequences
of the impossibility theorem are limited,mainly because
small adaptations to the properties avoid the impossibil-
ity,Kleinberg’s approach has motived several studies to
further understand data clustering [5],[4],[22].Among
those studies,very few are relevant to data streams.
To the best of our knowledge,the only exception is
the paper by Ryabko [15].It deals with the problem
of clustering stationary stochastic processes,in which
the author claims that two elements must be associated
with the same cluster if and only if they are generated
by the same probability distribution.Nonetheless,the
usefulness of this result is limited in the context of data
streams,in which the behavior of sequences changes over
III.Clustering data streams
Data streams differ from other types of data tra-
ditionally considered in Machine Learning.They are
infinite sequences with unknown and unstable behavior
[6].These characteristics contrast with traditional data
sets,which are finite,have no particular order,and are
characterized by a stable behavior.These distinctions
have motivated the adaptation of Kleinberg’s data set
properties to data streams.
In this context,a data stream is defined as an infinite
and ordered set of elements,that is,a sequence S =
),in which elements are indexed
by t ∈ R.The data collection is performed at time
instants t whose order is given by integers k ∈ (0,∞).
The complete sequence is represented by S,while a
sub-sequence of data collected up to a time instant t
is S
.Every element s
in this sequence consists of a
vector of values v,i.e.,the data features obtained.The
elements in S
are comparable by a dissimilarity function
d:S ×S →R
The main goal of a data-stream clustering algorithm
f(d,S) is to generate a sequence of partitions

} for elements in S
,in which Γ
).The sequence of partitions is created from the
first data-stream element S
until the most recent
element S
.Although S is infinite,partitions in
Γ have
a finite number of non-intersecting subsets.As a result
of practical requirements of the data streams analysis,
although S is infinite,every partition in
Γ may use only a
finite subset of S.Therefore,the design of f can consider
the option of removing a convenient subset of elements R
if they are represented by other elements or even expired.
In summary,the main difference of the problem of
data stream clustering,when compared to data set clus-
tering,originates from the infinite and ordered nature of
such scenarios,which demands a sequence of partitions
over time,instead of a single one.
A.Data-stream clustering properties
The differences between the data-set clustering prob-
lem and the data stream one motivated the adaptation
of Kleinberg’s properties.We observed these properties
do not assure that clustering partitions smoothly evolve
according to the data stream behavior.Therefore,we
introduce the property of Coherence,which states that
a coherent algorithm for data-stream clustering creates
partition sequences in which elements do not drastically
change from one cluster to another.This property en-
sures that clustering algorithms maintain continuity in
between consecutive partitions.
Firstly,the Time-space Scale Invariance property,
equivalent to Kleinberg’s property of Scale Invariance,
suggests that algorithms should produce the same par-
tition if the measurement scale of space or times is
transformed by a multiplicative constant.
Property 1:Time-space Scale Invariance – Consider
the multiplication of distances d by a positive constant
α,α ∙ d,and the time indexes of each element s
∈ S
by another positive constant β,s
.The sequence of
partitions f(d,s
) is equal to f(α ∙ d,s
Similarly,the properties of Richness in Data Streams
and Time-space Consistency are adaptations of Richness
and Consistency for data sets.Both properties consider a
notion of temporal proximity,and,therefore,differ from
properties to data set clustering in terms of a reference
point in time to conduct comparisons among partitions.
Property 2:Richness in Data Streams – Consider
a reference point in time t
,a sequence S
) observed until t
,and a matrix of dissim-
ilarity d among elements in S
.The clustering function
f complies with Richness in Data Streams at instant
through an arbitrary transformation of d and in the
order of elements in S
if it is capable of obtaining all
possible partitions for (s
The Time-space Consistency property extends Consis-
tency by including the notion of temporal proximity.It
states that if elements in the same cluster are temporal
and spatially closer to each other and,at the same
time,elements of different clusters are farther,then the
partition sequence is maintained.
Property 3:Time-space Consistency – Consider d

and S

are transformations of matrix dissimilarity d
and observation sequence S
such that the intervals
of the occurrence of elements within a cluster become
shorter and those of different clusters become longer.
A clustering function f complies with space-temporal
consistency if and only if f(d,S
) = f(d


The proposed properties are directly related to those
defined for data-set clustering.Nonetheless,the data
stream properties based on Kleinberg’s study do not
oblige a clustering algorithmto obtain a logical sequence
of partitions for data streams.For practical purposes,the
guarantee that an algorithm will not randomly assign
samples to distinct clusters in consecutive partitions is
desirable.For example,in the context of concept drift
[13],the evaluation of clusters over time is required.
However,if consecutive partitions do not share a sense
of continuity,the evaluation becomes meaningless.
The continuity of partitions (Definition 2) is for-
malized by a relation named reach,represented by ⊲,
according to Definition 1.It is inspired in the concept
of refinement by Kleinberg [8],which states that a
partition Γ is a refinement of Υ if and only if each subset
in Γ either belongs to Υ or is contained in one of its
Definition 1:A partition Γ reaches another partition
Υ if,for every subset A ∈ Γ,there is another subset B ∈
Υ,such as (A\R
) ⊆ (B\R
) or (B\R
) ⊆ (A\R
given that operation\is defined by B\A = {s ∈ B|s/∈
A} and R
is the subset of elements in Γ that do not
belong to Υ,and R
is the subset of elements in Υ that
are not in Γ.The relation reach is denoted by Γ ⊲ Υ.
Continuity (see Definition 2) is relevant to the analysis
of capability of clustering algorithms to capture the
behavior evolution of data streams.
Definition 2:A partition sequence
is continuous
if and only if for all consecutive partitions,i.e.,Γ
and Γ
,in which i ≥ 0,the relation Γ
⊲ Γ
is true.
The guarantee of generating continuous partitions is
stated by the Coherence property.The Coherence of a
clustering algorithmallows,for example,employing mea-
surements to evaluate partition sequences and support
the exploratory analysis of phenomena.
Property 4:Coherence – For any d,S
,and Γ
),the partition sequence
is always continuous.
The time dimension is included in the first three
properties proposed to formalize the data stream clus-
tering.However,these properties do not approach the
generation of an infinite sequence of partitions,which
contrasts with the data set clustering defined only by
one partition.In this sense,the coherence of data-stream
clustering algorithms is probably the most important
property.The main relevance of the Coherence property
relies on the fact it provides a parallel between clustering
continuity and function continuity,allowing to evaluate
differences in clustering models over time.By measuring
such differences,it is possible to better the understand
phenomena represented by data streams.However,as
shown in the next subsections,Coeherence can be in-
compatible with other properties,and,sometimes,is
not respected by some current important algorithms.
For example,the usage of the k-means algorithm for
clustering data streams,whose data are pre-organized in
the form of micro-clusters,as performed by Birch [23]
and Clustream [1] will not guarantee the continuity of
partitions and,eventually,may not represent phenomena
B.Analysis of clustering algorithm properties
The formalization we have proposed is aimed at
evaluating and supporting the design of data-stream
clustering algorithms.In a similar approach,Kleinberg
[8] considered the three previously mentioned data-set
clustering properties to prove no algorithm respects
them simultaneously.
Based on Kleinberg’s study,we have observed a similar
impossibility theorem to obtain a data-stream cluster-
ing algorithm for the first three properties proposed.
Furthermore,we have also observed that Properties 2,
Richness in data streams,and 4,Coherence,are mutually
exclusive,because the latter limits the possibilities to
produce partitions.In order to prove it,we show that if
a data-stream clustering function f complies with Rich-
ness,then it necessarily generates sequences of partitions
Γ in which at least a pair of consecutive partitions does
not respect the relation reach;and,also,if f generates
only continuous sequences of partitions,then it does not
complies with Richness.Theorem1 shows the impossibil-
ity of designing a data-stream clustering algorithm with
the properties of Coherence and Richness in data
Theorem 1:There is no data-stream clustering func-
tion f that complies with Properties 2 and 4.
First part:suppose f complies with Richness in data
streams,then f generates a partition sequence
taining consecutive partitions that do not reach each
We will show that there is a sequence
with parti-
tions that do not reach each other,regardless of changes
in both distances d and the sequence of elements S
partition Γ
is unreachable when there is no partition
preceding it,so that the relation Γ
⊲ Γ
is not
respected,that is,the set of all possible partitions at
instant t −1 is empty.
It suffices to provide an example in order to prove
that there is an unreachable Γ
.Take a partition at time
instant t,Γ
= {{a
}},where elements a
and b
,∀j in Γ
= {{a
}} are in different
It is known that {{a
}} ⊲
}} is not valid,because,according
to the definition of the relation reach,no subset of the
first partition is contained in a subset of the second one,
and reciprocally.It is also known that,by definition,
any f that complies with Richness in data streams is
supposed to generate such a sequence of partitions.
Second part:if f generates only continuous parti-
tion sequences,then f does not comply with Rich-
ness in data streams.Consider the following partition
= {{a
}},then f cannot generate
= {{a
}} in the next time instant t,and,
similarly,if we take Γ
= {{a
}},then f
will not generate Γ
= {{a
Therefore,no f that generates continuous partition
sequences may comply with Property 2.￿
C.Analysis of data stream clustering algorithms
The use of properties for the analysis of clustering
algorithms is incipient.However,such properties allow
the understanding of theoretical principles for the design
and selection of algorithms,taking into account utility
and economic factors inherent to the application domain
The properties we have introduced are the first to
represent the inherent characteristics of data streams.
In summary,these properties are Time-space Scale In-
variance,Richness in Data Streams,Time-space Consis-
tency and Coherence.We present a comparison among
the most relevant data stream clustering algorithms,
which is summarized in
Table I based on the proper-
ties proposed.Among the algorithms are Birch [23],
WaveCluster [16],CluStream [1],Olindda [17]
and Starvation Wta [21].Observe that none of the
algorithms complies with the property of Richness.
Table I
Evaluation of properties for data stream clustering
Algorithm T-S Scale T-S Coherence
Invariance Consistency
Birch N N N
WaveCluster N — N
CluStream N N N
Olindda Y N N
Wta Y Y Y
(1) T-S means Time-space.
(2) Value ’Y’ means yes and ’N’ means no.
Usually,to verify that algorithms comply with such
properties,one can prove a theorem or present a coun-
terexample.However,in some cases none of the two
options is possible.On the other hand,there are other
options to check whether an algorithm respects a given
property.For example,when an algorithm limits the
The symbol ‘—’ represents that it was not possible to achieve
an evaluation.We omit further details on verifying algorithm
properties due to lack of space.
number of groups,it does not comply with the property
of Richness.An algorithm does not comply with Prop-
erty 1,i.e.,Time-space Invariance,if it considers any
threshold for accepting elements in clusters as there is
always a scalar constant that,multiplied by the distances
among elements,will modify the partition produced.
Still,an algorithm does not comply with Property 1 and
Property 4 if it does not consider the order of data during
Another analytic option considers the evidence pre-
viously established for algorithms used to cluster tra-
ditional data sets,such as k-means and hierarchical
algorithms (e.g.,Single-linkage) [1],[23].For example,
algorithms that use k-means in the clustering process
do not comply with Property 4 because k-means does
not guarantee the continuity in sequences of partitions.
Among such algorithms,Birch and CluStream are
some of the most commonly considered data stream
clustering algorithms.
IV.Conclusions and Future Work
Despite the fundamental differences between data set
and data stream clustering,many studies have over-
looked the infinite,dynamic and transient nature of the
latter.In this paper,we formally tackled the problem of
data-stream clustering as an infinite sequence of parti-
tions.This approach is an extension of Kleinberg’s prop-
erties.Besides adapting Kleinberg’s properties to data
streams,we also proposed a new property referred to as
Coherence,which deals with the infinite and sequen-
tial properties of data streams.This new property was
proven to be incompatible with Richness,evidencing
the trade-off in between both properties when designing
data-stream clustering algorithms.The existence of few
related studies in this theoretical branch indicates that
this is a seminal study and also that there are plenty of
possibilities towards developing a clustering theory.
This paper is based upon work supported by FAPESP
– S˜ao Paulo Research Foundation,Brazil,under grants
no.2006/05939-0 and 2011/19459-8,”CAPES – Brazil-
ian Federal Agency for Support and Evaluation of Grad-
uate Education”research funding agency under grant no.
PDEE-4443-08-0,CNPq – National Council for Scientific
and Technological Development research funding agency
under grant no.304338/2008-7.Any opinions,findings,
and conclusions or recommendations expressed in this
material are those of the authors and do not necessarily
reflect the views of FAPESP,CAPES and CNPq.
[1] C.C.Aggarwal,J.Han,J.Wang,and P.S.Yu.A frame-
work for clustering evolving data streams.In Proceedings
of the 29th international conference on Very Large Data
Bases,volume 29,pages 81–92,Berlim,Alemanha,2003.
VLDB Endowment.
[2] B.Allcock,I.Foster,V.Nefedova,A.Chervenak,
B.Drach,et al.High-performance remote access to
climate simulation data:A challenge problem for data
grid technologies.In SC2001 Conference,pages 1334–
[3] S.Ben-David and M.Ackerman.Measures of clustering
quality:A working set of axioms for clustering.In
Advances in Neural Information Processing Systems 21,
pages 121–128.2009.
[4] G.Carlsson and F.M´emoli.Characterization,Stability
and Convergence of Hierarchical Clustering Methods.
Journal of Machine Learning Research,11:1425–1470,
[5] G.Carlsson and F.M´emoli.Persistent clustering and
a theorem of J.Kleinberg.ArXiv e-prints,1:17,2008.
[6] S.Guha,A.Meyerson,N.Mishra,R.Motwani,and
L.O’Callaghan.Clustering Data Streams:Theory and
Practice.IEEE Transactions on Knowledge and Data
[7] D.Kifer,S.Ben-David,and J.Gehrke.Detecting
change in data streams.In Proceedings of the Thirtieth
international conference on Very Large Data Bases,vol-
ume 30,pages 180–191,Toronto,Canada,2004.VLDB
[8] J.Kleinberg.An impossibility theoremfor clustering.In
Proceedings of Advances in Neural Information Process-
ing Systems 15,pages 446–453.The MIT Press,2002.
[9] P.Lindstrom,S.J.Delany,and B.Mac Namee.Handling
Concept Drift in a Text Data Stream Constrained by
High Labelling Cost.In Proceedings of the Twenty-Third
International Florida Artificial Intelligence Research So-
ciety Conference,page 52,Daytona Beach,USA,2010.
[10] M.Masud,Q.Chen,J.Gao,L.Khan,J.Han,and
B.Thuraisingham.Classification and Novel Class Detec-
tion of Data Streams in a Dynamic Feature Space.Ma-
chine Learning and Knowledge Discovery in Databases,
[11] L.O’Callaghan,N.Mishra,A.Meyerson,S.Guha,and
R.Motwani.Streaming-data algorithms for high-quality
clustering.In Proceedings of the 18th International
Conference on Data Engineering,pages 685–694,San
[12] N.G.Pavlidis,D.K.Tasoulis,N.M.Adams,and D.J.
Hand.[lambda]-Perceptron:An adaptive classifier for
data streams.Pattern Recognition,44(44):78–96,2011.
[13] P.P.Rodrigues,J.Gama,and J.P.Pedroso.Hierarchical
clustering of time-series data streams.IEEE Transac-
tions on Knowledge and Data Engineering,pages 615–
[14] H.L.Royden.Real Analysis.Macmillan,New York,
USA,2 edition,1968.
[15] D.Ryabko.Clustering processes.In Proceedings of
the 27th International Conference on Machine Learning,
pages 919–926,Haifa,Israel,2010.
[16] G.Sheikholeslami,S.Chatterjee,and A.Zhang.
Wavecluster:A multi-resolution clustering approach for
very large spatial databases.In Proceedings of the
International Conference on Very Large Data Bases,
pages 428–439,New York,USA,1998.Citeseer.
[17] Eduardo Spinosa, Carvalho,and J.Gama.
OLINDDA:a cluster-based approach for detecting nov-
elty and concept drift in data streams.In Proceedings of
the 2007 ACMSymposium on Applied Computing,pages
448–452,New York,USA,2007.ACM.
[18] G.W.Swenson Jr.and K.I.Kellermann.An Inter-
continental Array–A Next-Generation Radio Telescope.
[19] T.Tyson,R.Pike,M.Stein,and A.Szalay.Managing
and Mining the LSST data sets.Technical report,The
LSST Collaboration,Tucson,USA,2002.
[20] R.Xu and D.Wunsch.Survey of clustering algorithms.
IEEE Transactions on neural networks,16(3):645–678,
[21] S.Young,I.Arel,T.P.Karnowski,and D.Rose.A Fast
and Stable Incremental Clustering Algorithm.In 2010
Seventh International Conference on Information Tech-
nology,pages 204–209,Shanghai,China,2010.IEEE.
[22] R.B.Zadeh.Towards a Principled Theory
of Clustering.Unpublished.Available at:˜rezab/papers/principled.pdf.,
[23] T.Zhang,R.Ramakrishnan,and M.Livny.BIRCH:an
efficient data clustering method for very large databases.
Proceedings of the International Conference on Manage-
ment of Data,25(2):103–114,1996.