Formalization of Data Stream Clustering Properties and Analysis of
Algorithms
Marcelo Keese Albertini e Rodrigo Fernandes de Mello
Department of Computer Sciences
Institute of Mathematics and Computer Sciences
University of Sao Paulo
Av.Trabalhador S˜aocarlense 400,S˜ao Carlos  SP,Brazil
{albertini,mello}@icmc.usp.br
Abstract—The understanding of several phenom
ena requires unbounded data collections,called data
streams.These phenomena often present unstable
behavior and are studied by means of unsupervised
induction processes based on data clustering.Cur
rently,clustering processes have shown serious lim
itations in their applications to data streams due
to the demands imposed by behavioral changes and
unlimited data collection.However,despite the key
distinctions in between traditional data sets,which
are ﬁnite and unordered,and data streams,which are
essentially inﬁnite sequences,studies have overlooked
the dynamic and transient nature of streams,lim
iting the appropriate understanding of phenomena.
The lack of a theoretical analysis for the problem
of data streams clustering led us to propose,in this
paper,a formalization based on Set Theory.This
formalization made it possible to identify and propose
basic properties for the design and comparison of
data stream clustering algorithms.It is expected to
be a starting point to understand the foundations
of unsupervised induction based on clustering and,
mainly,the modeling of phenomena.
Keywords:Artiﬁcial Intelligence;Machine Learning;
Data Mining;Unsupervised Learning;Data Clustering;
Data Streams;Set Theory
I.Introduction
The literature from several research ﬁelds has de
scribed two main types of phenomena that produce
endless sequences of data also referred to as data streams
[11],[12].The ﬁrst type is characterized by the need for
data storage space and fast computation.In this situa
tion,data is stored in secondary memory,which presents
low transfer rates,and usually accessed in a contiguous
manner.The second type is even more computationally
demanding:data is collected at high rates and shortly
afterwards is disposed.In this type,clustering models
must be continuously obtained throughout the endless
datagathering process,whose dynamical properties,i.e.,
behavior,are expected to evolve over time [7],[12].
Data streams are frequently found in computationally
intensive environments,such as climate and weather
analysis [2],text mining [10],[9],genomic analysis,and
advanced scientiﬁc experiments [18],[19].
The typical clustering process has been designed to ap
proach ﬁnite and unordered data sets and,consequently,
does not meet data streams requirements [20].There
exist great distinctions between clustering requirements
for data sets and data streams.Firstly,a data stream
must be accessed and processed sequentially as it cannot
be completely stored in memory.Secondly,the open
endedness nature of data streams demands continuous
and automatic analysis.Thirdly,while stable phenomena
can be accurately represented by bounded data sets
and are suitable to traditional data clustering,data
streams,on the other hand,usually represent unstable
phenomena [7],whose characteristics tend to change
over the collection.This tendency indicates the transient
nature of data streams and demands a continuous re
evaluation of clustering models.Finally,the role of do
main experts when clustering data streams is diﬀerent
from clustering data sets.In the latter,specialists are
often required to empirically extract,select and analyze
data features in order to deﬁne the clustering algorithm
and the validation criterion.Conversely,in data streams,
the fast and continuous production of large amounts of
data restricts human intervention,due to the limited
capability of specialists to make wellfounded decisions
under such constraints.
In order to better understand data set clustering,
Kleinberg [8] formalized three properties,which allow
the analysis of algorithm capabilities independently of
the target application.The formalization of the dataset
clustering problem is an important step towards its deep
understanding.Recently,several studies concerning the
usage and proposition of properties to related problems
have been developed [5],[4],[22].These studies have
broadened the possibilities to analyze clustering algo
rithms,although very few of them have been conducted
in the context of data streams.
The lack of formal studies on the problem motived
our formalization of data stream clustering as an ex
tension of Kleinberg’s approach to data sets [8].In
such a deﬁnition,data stream clustering is described in
terms of an inﬁnite sequence of partitions,which are
modiﬁed along time.The endless and changing nature
of data streams requires properties diﬀerent than those
proposed by Kleinberg for ﬁnite and unordered data sets.
Data streams,as inﬁnite and ordered sequences,need
clustering properties representative of the time evolution
and behavior changing.
We have extended Kleinberg’s properties to repre
sent clustering partitions evolving according to the data
stream behavior.In addition,we introduce the Coher
ence property,which states that partitions must conti
nously evolve over time in order to preserve the meaning
of the clustering process.However,we observed that this
property is incompatible with the Kleinberg’s Richness,
which states that a dataset clustering function must
be capable of generating any partition.This conﬂict
indicates that tradeoﬀ analyses are needed in order
to demand properties from algorithms.Additionally,we
noticed that it is diﬃcult to ﬁnd an algorithm to comply
with Richness in a data stream context.
The remainder of this paper is organized as follows:
Section II introduces studies on data set clustering;
Section III describes our formalization approach to data
streams;Section IV draws conclusions and present ideas
for future work;and,ﬁnally,references are listed.
II.Related Work
Kleinberg [8] considered concepts of Set Theory [14]
to describe clustering algorithms in general terms,i.e.,
without relying on a speciﬁc algorithm,objective func
tion for optimization,or statistical model.This analysis
is important to understand the functioning of algorithms
beforehand,i.e.,without the need for experimental trials,
and also to provide design guidelines for new approaches.
The concepts involved in Kleinberg’s formalization are
illustrated in Figure 1.This formalization assumes data
elements are organized in set E,whose measurements
of dissimilarity in between elements are provided in
a square matrix.This matrix,d,is described in the
form of a function d:E × E → R
+
,and is applied
to two elements {i,j} ∈ E to obtain a dissimilarity
value d(i,j) in [0,∞).The construction of d determines
the possibility of a clustering algorithm to organize the
elements in E.A cluster is essentially an organization
of elements of E into subsets,so that each element is
contained in one,and only one,subset.This structure is
known as the partition of a set.Partitions are denoted
by uppercase Greek letters,such as Γ (Gamma) and Υ
(Upsilon).Aclustering algorithmis,therefore,a function
f(d),which considers a distance matrix d for mapping
elements in E into a partition belonging to the universe
of partitions U.
E d:E ×E →R
+
d(♠,♣) = 9
f:d →U
f
f(d) = Γ
i
j
Figure 1.Illustration of concepts of data set clustering
By using this formalization,Kleinberg proposed prop
erties to data set clustering introduced in the form of
statements of principles or selfevident conditions to a
wellsucceeded clustering algorithm.The author initially
proposed three properties:ScaleInvariance,Richness
and Consistency.
ScaleInvariance refers to the ability of algorithms to
abstract the measurement scale of elements,i.e.,for
any distance function d and constant α > 0,we have
f(d) = f(α ∙ d).This property reﬂects the expectation
that the magnitude of the scale of elements should not
change the partition;for example,adapting elements
from centimeters to inches should not modify the parti
tions obtained.
Richness refers to the ability of an algorithm to in
duce all possible partitions for a set of elements.When
Richness holds,for any partition Γ of E,there exists a
distance matrix d for which f(d) results in Γ.It applies
the concept of surjection of Set Theory [14],which
deﬁnes the image of the surjective function as being
equal to its counterdomain.This property assumes it is
possible to arrange elements by modifying their relative
distances in order to obtain all partitions.A clustering
algorithm complies with Richness if and only if it
permits obtaining all partitions for the elements in E,
otherwise some partitions are impossible to be found.
Ensuring the possibility of ﬁnding all partitions for a
data set,as required by Richness,appears,at ﬁrst,de
sirable for an algorithm whose data nature is unknown.
However,this property may be diﬃcult to comply with
[8].
Consistency refers to the ability algorithms have to
maintain the same partition when distances among ele
ments within the same group are reduced while distances
among elements of diﬀerent groups are increased.For
example,consider a partition Γ generated by a clustering
function f,according to a distance function deﬁned by d.
Also consider changing d into d
′
in a way that it increases
the distances among elements of distinct subsets of Γ
and reduces the distances among elements of the same
subset.A consistent clustering function f(d
′
) always
generates a partition Γ
′
equivalent to Γ,in the sense
that f(d
′
) = f(d).
The previously described Kleinberg’s properties have
proven too demanding for a clustering function to com
ply with,resulting in the incompatibility among them.
Kleinberg [8] proved,in an impossibility theorem,that
only two out of these three properties could be satisﬁed
by any algorithm.Although the practical consequences
of the impossibility theorem are limited,mainly because
small adaptations to the properties avoid the impossibil
ity,Kleinberg’s approach has motived several studies to
further understand data clustering [5],[4],[22].Among
those studies,very few are relevant to data streams.
To the best of our knowledge,the only exception is
the paper by Ryabko [15].It deals with the problem
of clustering stationary stochastic processes,in which
the author claims that two elements must be associated
with the same cluster if and only if they are generated
by the same probability distribution.Nonetheless,the
usefulness of this result is limited in the context of data
streams,in which the behavior of sequences changes over
time.
III.Clustering data streams
Data streams diﬀer from other types of data tra
ditionally considered in Machine Learning.They are
inﬁnite sequences with unknown and unstable behavior
[6].These characteristics contrast with traditional data
sets,which are ﬁnite,have no particular order,and are
characterized by a stable behavior.These distinctions
have motivated the adaptation of Kleinberg’s data set
properties to data streams.
In this context,a data stream is deﬁned as an inﬁnite
and ordered set of elements,that is,a sequence S =
(s
t
−∞
,...,s
t
−k
,...,s
t
0
),in which elements are indexed
by t ∈ R.The data collection is performed at time
instants t whose order is given by integers k ∈ (0,∞).
The complete sequence is represented by S,while a
subsequence of data collected up to a time instant t
is S
t
.Every element s
t
in this sequence consists of a
vector of values v,i.e.,the data features obtained.The
elements in S
t
are comparable by a dissimilarity function
d:S ×S →R
+
.
The main goal of a datastream clustering algorithm
f(d,S) is to generate a sequence of partitions
¯
Γ
t
0
=
{Γ
t
−∞
,...,Γ
t
0
} for elements in S
t
0
,in which Γ
t
0
=
f(d,S
t
0
).The sequence of partitions is created from the
ﬁrst datastream element S
t
−∞
until the most recent
element S
t
0
.Although S is inﬁnite,partitions in
¯
Γ have
a ﬁnite number of nonintersecting subsets.As a result
of practical requirements of the data streams analysis,
although S is inﬁnite,every partition in
¯
Γ may use only a
ﬁnite subset of S.Therefore,the design of f can consider
the option of removing a convenient subset of elements R
if they are represented by other elements or even expired.
In summary,the main diﬀerence of the problem of
data stream clustering,when compared to data set clus
tering,originates from the inﬁnite and ordered nature of
such scenarios,which demands a sequence of partitions
over time,instead of a single one.
A.Datastream clustering properties
The diﬀerences between the dataset clustering prob
lem and the data stream one motivated the adaptation
of Kleinberg’s properties.We observed these properties
do not assure that clustering partitions smoothly evolve
according to the data stream behavior.Therefore,we
introduce the property of Coherence,which states that
a coherent algorithm for datastream clustering creates
partition sequences in which elements do not drastically
change from one cluster to another.This property en
sures that clustering algorithms maintain continuity in
between consecutive partitions.
Firstly,the Timespace Scale Invariance property,
equivalent to Kleinberg’s property of Scale Invariance,
suggests that algorithms should produce the same par
tition if the measurement scale of space or times is
transformed by a multiplicative constant.
Property 1:Timespace Scale Invariance – Consider
the multiplication of distances d by a positive constant
α,α ∙ d,and the time indexes of each element s
t
∈ S
by another positive constant β,s
βt
.The sequence of
partitions f(d,s
t
) is equal to f(α ∙ d,s
βt
).
Similarly,the properties of Richness in Data Streams
and Timespace Consistency are adaptations of Richness
and Consistency for data sets.Both properties consider a
notion of temporal proximity,and,therefore,diﬀer from
properties to data set clustering in terms of a reference
point in time to conduct comparisons among partitions.
Property 2:Richness in Data Streams – Consider
a reference point in time t
0
,a sequence S
t
0
=
(s
−∞
,...,s
t
0
) observed until t
0
,and a matrix of dissim
ilarity d among elements in S
t
.The clustering function
f complies with Richness in Data Streams at instant
t
0
through an arbitrary transformation of d and in the
order of elements in S
t
if it is capable of obtaining all
possible partitions for (s
−∞
,...,s
t
0
).
The Timespace Consistency property extends Consis
tency by including the notion of temporal proximity.It
states that if elements in the same cluster are temporal
and spatially closer to each other and,at the same
time,elements of diﬀerent clusters are farther,then the
partition sequence is maintained.
Property 3:Timespace Consistency – Consider d
′
and S
′
t
are transformations of matrix dissimilarity d
and observation sequence S
t
such that the intervals
of the occurrence of elements within a cluster become
shorter and those of diﬀerent clusters become longer.
A clustering function f complies with spacetemporal
consistency if and only if f(d,S
t
) = f(d
′
,S
′
t
).
The proposed properties are directly related to those
deﬁned for dataset clustering.Nonetheless,the data
stream properties based on Kleinberg’s study do not
oblige a clustering algorithmto obtain a logical sequence
of partitions for data streams.For practical purposes,the
guarantee that an algorithm will not randomly assign
samples to distinct clusters in consecutive partitions is
desirable.For example,in the context of concept drift
[13],the evaluation of clusters over time is required.
However,if consecutive partitions do not share a sense
of continuity,the evaluation becomes meaningless.
The continuity of partitions (Deﬁnition 2) is for
malized by a relation named reach,represented by ⊲,
according to Deﬁnition 1.It is inspired in the concept
of reﬁnement by Kleinberg [8],which states that a
partition Γ is a reﬁnement of Υ if and only if each subset
in Γ either belongs to Υ or is contained in one of its
subsets.
Deﬁnition 1:A partition Γ reaches another partition
Υ if,for every subset A ∈ Γ,there is another subset B ∈
Υ,such as (A\R
Γ
) ⊆ (B\R
Υ
) or (B\R
Υ
) ⊆ (A\R
Γ
),
given that operation\is deﬁned by B\A = {s ∈ Bs/∈
A} and R
Γ
is the subset of elements in Γ that do not
belong to Υ,and R
Υ
is the subset of elements in Υ that
are not in Γ.The relation reach is denoted by Γ ⊲ Υ.
Continuity (see Deﬁnition 2) is relevant to the analysis
of capability of clustering algorithms to capture the
behavior evolution of data streams.
Deﬁnition 2:A partition sequence
¯
Γ
t
is continuous
if and only if for all consecutive partitions,i.e.,Γ
t−i−1
and Γ
t−i
,in which i ≥ 0,the relation Γ
t−1
⊲ Γ
t
is true.
The guarantee of generating continuous partitions is
stated by the Coherence property.The Coherence of a
clustering algorithmallows,for example,employing mea
surements to evaluate partition sequences and support
the exploratory analysis of phenomena.
Property 4:Coherence – For any d,S
t
,and Γ
t
=
f(d,S
t
),the partition sequence
¯
Γ
t
is always continuous.
The time dimension is included in the ﬁrst three
properties proposed to formalize the data stream clus
tering.However,these properties do not approach the
generation of an inﬁnite sequence of partitions,which
contrasts with the data set clustering deﬁned only by
one partition.In this sense,the coherence of datastream
clustering algorithms is probably the most important
property.The main relevance of the Coherence property
relies on the fact it provides a parallel between clustering
continuity and function continuity,allowing to evaluate
diﬀerences in clustering models over time.By measuring
such diﬀerences,it is possible to better the understand
phenomena represented by data streams.However,as
shown in the next subsections,Coeherence can be in
compatible with other properties,and,sometimes,is
not respected by some current important algorithms.
For example,the usage of the kmeans algorithm for
clustering data streams,whose data are preorganized in
the form of microclusters,as performed by Birch [23]
and Clustream [1] will not guarantee the continuity of
partitions and,eventually,may not represent phenomena
behavior.
B.Analysis of clustering algorithm properties
The formalization we have proposed is aimed at
evaluating and supporting the design of datastream
clustering algorithms.In a similar approach,Kleinberg
[8] considered the three previously mentioned dataset
clustering properties to prove no algorithm respects
them simultaneously.
Based on Kleinberg’s study,we have observed a similar
impossibility theorem to obtain a datastream cluster
ing algorithm for the ﬁrst three properties proposed.
Furthermore,we have also observed that Properties 2,
Richness in data streams,and 4,Coherence,are mutually
exclusive,because the latter limits the possibilities to
produce partitions.In order to prove it,we show that if
a datastream clustering function f complies with Rich
ness,then it necessarily generates sequences of partitions
¯
Γ in which at least a pair of consecutive partitions does
not respect the relation reach;and,also,if f generates
only continuous sequences of partitions,then it does not
complies with Richness.Theorem1 shows the impossibil
ity of designing a datastream clustering algorithm with
the properties of Coherence and Richness in data
streams.
Theorem 1:There is no datastream clustering func
tion f that complies with Properties 2 and 4.
Proof:
First part:suppose f complies with Richness in data
streams,then f generates a partition sequence
¯
Γ
t
con
taining consecutive partitions that do not reach each
other.
We will show that there is a sequence
¯
Γ
t
with parti
tions that do not reach each other,regardless of changes
in both distances d and the sequence of elements S
t
.A
partition Γ
t
is unreachable when there is no partition
Γ
t−1
preceding it,so that the relation Γ
t−1
⊲ Γ
t
is not
respected,that is,the set of all possible partitions at
instant t −1 is empty.
It suﬃces to provide an example in order to prove
that there is an unreachable Γ
t
.Take a partition at time
instant t,Γ
t
= {{a
1
,b
2
},{a
2
,b
1
}},where elements a
i
,∀i
and b
j
,∀j in Γ
t−1
= {{a
1
,a
2
},{b
1
,b
2
}} are in diﬀerent
clusters.
It is known that {{a
1
,a
2
},{b
1
,b
2
}} ⊲
{{a
1
,b
2
},{a
2
,b
1
}} is not valid,because,according
to the deﬁnition of the relation reach,no subset of the
ﬁrst partition is contained in a subset of the second one,
and reciprocally.It is also known that,by deﬁnition,
any f that complies with Richness in data streams is
supposed to generate such a sequence of partitions.
Second part:if f generates only continuous parti
tion sequences,then f does not comply with Rich
ness in data streams.Consider the following partition
Γ
t−1
= {{a
1
,a
2
},{b
1
,b
2
}},then f cannot generate
Γ
t
= {{a
1
,b
2
},{b
1
,a
2
}} in the next time instant t,and,
similarly,if we take Γ
t−1
= {{a
1
,b
2
},{b
1
,a
2
}},then f
will not generate Γ
t
= {{a
1
,a
2
},{b
1
,b
2
}}.
Therefore,no f that generates continuous partition
sequences may comply with Property 2.
C.Analysis of data stream clustering algorithms
The use of properties for the analysis of clustering
algorithms is incipient.However,such properties allow
the understanding of theoretical principles for the design
and selection of algorithms,taking into account utility
and economic factors inherent to the application domain
[3].
The properties we have introduced are the ﬁrst to
represent the inherent characteristics of data streams.
In summary,these properties are Timespace Scale In
variance,Richness in Data Streams,Timespace Consis
tency and Coherence.We present a comparison among
the most relevant data stream clustering algorithms,
which is summarized in
1
Table I based on the proper
ties proposed.Among the algorithms are Birch [23],
WaveCluster [16],CluStream [1],Olindda [17]
and Starvation Wta [21].Observe that none of the
algorithms complies with the property of Richness.
Table I
Evaluation of properties for data stream clustering
algorithms
Algorithm TS Scale TS Coherence
Invariance Consistency
Birch N N N
WaveCluster N — N
CluStream N N N
Olindda Y N N
Wta Y Y Y
(1) TS means Timespace.
(2) Value ’Y’ means yes and ’N’ means no.
Usually,to verify that algorithms comply with such
properties,one can prove a theorem or present a coun
terexample.However,in some cases none of the two
options is possible.On the other hand,there are other
options to check whether an algorithm respects a given
property.For example,when an algorithm limits the
1
The symbol ‘—’ represents that it was not possible to achieve
an evaluation.We omit further details on verifying algorithm
properties due to lack of space.
number of groups,it does not comply with the property
of Richness.An algorithm does not comply with Prop
erty 1,i.e.,Timespace Invariance,if it considers any
threshold for accepting elements in clusters as there is
always a scalar constant that,multiplied by the distances
among elements,will modify the partition produced.
Still,an algorithm does not comply with Property 1 and
Property 4 if it does not consider the order of data during
clustering.
Another analytic option considers the evidence pre
viously established for algorithms used to cluster tra
ditional data sets,such as kmeans and hierarchical
algorithms (e.g.,Singlelinkage) [1],[23].For example,
algorithms that use kmeans in the clustering process
do not comply with Property 4 because kmeans does
not guarantee the continuity in sequences of partitions.
Among such algorithms,Birch and CluStream are
some of the most commonly considered data stream
clustering algorithms.
IV.Conclusions and Future Work
Despite the fundamental diﬀerences between data set
and data stream clustering,many studies have over
looked the inﬁnite,dynamic and transient nature of the
latter.In this paper,we formally tackled the problem of
datastream clustering as an inﬁnite sequence of parti
tions.This approach is an extension of Kleinberg’s prop
erties.Besides adapting Kleinberg’s properties to data
streams,we also proposed a new property referred to as
Coherence,which deals with the inﬁnite and sequen
tial properties of data streams.This new property was
proven to be incompatible with Richness,evidencing
the tradeoﬀ in between both properties when designing
datastream clustering algorithms.The existence of few
related studies in this theoretical branch indicates that
this is a seminal study and also that there are plenty of
possibilities towards developing a clustering theory.
Acknowledments
This paper is based upon work supported by FAPESP
– S˜ao Paulo Research Foundation,Brazil,under grants
no.2006/059390 and 2011/194598,”CAPES – Brazil
ian Federal Agency for Support and Evaluation of Grad
uate Education”research funding agency under grant no.
PDEE4443080,CNPq – National Council for Scientiﬁc
and Technological Development research funding agency
under grant no.304338/20087.Any opinions,ﬁndings,
and conclusions or recommendations expressed in this
material are those of the authors and do not necessarily
reﬂect the views of FAPESP,CAPES and CNPq.
References
[1] C.C.Aggarwal,J.Han,J.Wang,and P.S.Yu.A frame
work for clustering evolving data streams.In Proceedings
of the 29th international conference on Very Large Data
Bases,volume 29,pages 81–92,Berlim,Alemanha,2003.
VLDB Endowment.
[2] B.Allcock,I.Foster,V.Nefedova,A.Chervenak,
E.Deelman,C.Kesselman,J.Lee,A.Sim,A.Shoshani,
B.Drach,et al.Highperformance remote access to
climate simulation data:A challenge problem for data
grid technologies.In SC2001 Conference,pages 1334–
1356,2001.
[3] S.BenDavid and M.Ackerman.Measures of clustering
quality:A working set of axioms for clustering.In
Advances in Neural Information Processing Systems 21,
pages 121–128.2009.
[4] G.Carlsson and F.M´emoli.Characterization,Stability
and Convergence of Hierarchical Clustering Methods.
Journal of Machine Learning Research,11:1425–1470,
2010.
[5] G.Carlsson and F.M´emoli.Persistent clustering and
a theorem of J.Kleinberg.ArXiv eprints,1:17,2008.
http://adsabs.harvard.edu/abs/2008arXiv0808.2241C.
[6] S.Guha,A.Meyerson,N.Mishra,R.Motwani,and
L.O’Callaghan.Clustering Data Streams:Theory and
Practice.IEEE Transactions on Knowledge and Data
Engineering,15(3):515–528,2003.
[7] D.Kifer,S.BenDavid,and J.Gehrke.Detecting
change in data streams.In Proceedings of the Thirtieth
international conference on Very Large Data Bases,vol
ume 30,pages 180–191,Toronto,Canada,2004.VLDB
Endowment.
[8] J.Kleinberg.An impossibility theoremfor clustering.In
Proceedings of Advances in Neural Information Process
ing Systems 15,pages 446–453.The MIT Press,2002.
[9] P.Lindstrom,S.J.Delany,and B.Mac Namee.Handling
Concept Drift in a Text Data Stream Constrained by
High Labelling Cost.In Proceedings of the TwentyThird
International Florida Artiﬁcial Intelligence Research So
ciety Conference,page 52,Daytona Beach,USA,2010.
[10] M.Masud,Q.Chen,J.Gao,L.Khan,J.Han,and
B.Thuraisingham.Classiﬁcation and Novel Class Detec
tion of Data Streams in a Dynamic Feature Space.Ma
chine Learning and Knowledge Discovery in Databases,
6322:337–352,2010.
[11] L.O’Callaghan,N.Mishra,A.Meyerson,S.Guha,and
R.Motwani.Streamingdata algorithms for highquality
clustering.In Proceedings of the 18th International
Conference on Data Engineering,pages 685–694,San
Jose,USA,2002.IEEE.
[12] N.G.Pavlidis,D.K.Tasoulis,N.M.Adams,and D.J.
Hand.[lambda]Perceptron:An adaptive classiﬁer for
data streams.Pattern Recognition,44(44):78–96,2011.
[13] P.P.Rodrigues,J.Gama,and J.P.Pedroso.Hierarchical
clustering of timeseries data streams.IEEE Transac
tions on Knowledge and Data Engineering,pages 615–
627,2007.
[14] H.L.Royden.Real Analysis.Macmillan,New York,
USA,2 edition,1968.
[15] D.Ryabko.Clustering processes.In Proceedings of
the 27th International Conference on Machine Learning,
pages 919–926,Haifa,Israel,2010.
[16] G.Sheikholeslami,S.Chatterjee,and A.Zhang.
Wavecluster:A multiresolution clustering approach for
very large spatial databases.In Proceedings of the
International Conference on Very Large Data Bases,
pages 428–439,New York,USA,1998.Citeseer.
[17] Eduardo Spinosa,A.C.P.F.de Carvalho,and J.Gama.
OLINDDA:a clusterbased approach for detecting nov
elty and concept drift in data streams.In Proceedings of
the 2007 ACMSymposium on Applied Computing,pages
448–452,New York,USA,2007.ACM.
[18] G.W.Swenson Jr.and K.I.Kellermann.An Inter
continental Array–A NextGeneration Radio Telescope.
Science,188(4195):1263,1975.
[19] T.Tyson,R.Pike,M.Stein,and A.Szalay.Managing
and Mining the LSST data sets.Technical report,The
LSST Collaboration,Tucson,USA,2002.
[20] R.Xu and D.Wunsch.Survey of clustering algorithms.
IEEE Transactions on neural networks,16(3):645–678,
2005.
[21] S.Young,I.Arel,T.P.Karnowski,and D.Rose.A Fast
and Stable Incremental Clustering Algorithm.In 2010
Seventh International Conference on Information Tech
nology,pages 204–209,Shanghai,China,2010.IEEE.
[22] R.B.Zadeh.Towards a Principled Theory
of Clustering.Unpublished.Available at:
http://www.stanford.edu/˜rezab/papers/principled.pdf.,
2010.
[23] T.Zhang,R.Ramakrishnan,and M.Livny.BIRCH:an
eﬃcient data clustering method for very large databases.
Proceedings of the International Conference on Manage
ment of Data,25(2):103–114,1996.
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment