Shape and Size Parameters in Clustering Algorithms, Dr. Rudolf Kruse

spiritualblurtedAI and Robotics

Nov 24, 2013 (3 years and 6 months ago)

104 views

S
N
F
EURO
UZZY
Shapeand SizeParameters
in ClusteringAlgorithms
Prof. Dr. Rudolf Kruse, PD Dr. Christian Borgelt
University of Magdeburg,
kruse@iws.cs.uni-magdeburg.de
2
S
N
F
EURO
UZZY
Overview
Prototype-basedClustering: Fundamentals and Algorithms
DescribingClusters withPrototypes
Distance-based(Fuzzy) Clustering
LearningVector Quantization
ExpectationMaximizationClustering
Shapeand SizeParameters in Clustering
Regularization: ConstrainingShapeand Size
Online Adaptation: Exponential Decay
Application: ClusteringDocumentCollections
TheVector SpaceModel and CosineSimilarity
Index Term Selectionand DocumentEncoding
Experimental Results
Summary
3
S
N
F
EURO
UZZY
ClusteringFundamentals
General Objectiveof Clustering:
Group thegivendatapointsin such a way that
datapointsfromthesamegroup/clusterareas similaras possible, and
datapointsfromdifferent groups/clustersareas dissimilaras possible.
Clusters maysimplybesubsetsof thedatasetormaybedescribedbyprototypes.
“Hard”and “Soft”Clustering:
Groupingof thedatapointsintogroups/clustersmaybehard:
eachdatapoint isassigned(exclusively) to onecluster,
oritmaybesoftenedbyassigning
(fuzzy) membershipdegreesor
(posterior) probabilities.
In thiscasethedatapoint maybelongto different degreesto severalclusters.
4
S
N
F
EURO
UZZY
Prototype-basedClustering
Prototype-basedClustering:
Eachgroup/clusterisdescribedbya prototypecomprisinginformationabout
thelocationof thegroup/cluster(its“center”),
thesizeof thegroup/cluster(its“referenceradius”),
theshape(and orientation) of thegroup/cluster, and
the(relative) weightthegroup/cluster.
The(degreeof) membershipof a datapoint to a givengroup/clusterorthe
probabilitythatitbelongsto thisgroup/clusterdependson
thedistanceof thedatapoint to theprototype
(takingintoaccounttheprototypeproperties) and
a similarityfunctionorprobabilitydensityfunction.
Thelatterisusuallyspecifiedas a radial functionon thedistance.
Itisthesameforall clusters(theydifferonlybytheprototypeproperties).
5
S
N
F
EURO
UZZY
Distance Functions
Most frequentlyusedistheMinkowski Familyof distance functions:
Thedistance functionfromthisfamilyare
translationinvariant,
rotation invariant onlyfork = 2,
notscaleinvariant.
6
S
N
F
EURO
UZZY
Distance Functions
Illustration of Distance FunctionsfromtheMinkowski Family
Circlesdefinedbythesespecialdistance measures:
7
S
N
F
EURO
UZZY
Cluster-SpecificDistance Functions
Thesimilarityof a datapoint to a prototypedependson theirdistance.
Iftheclusterprototypeisa simple clustercenter, a generaldistance
measurecanbedefinedon thedataspace.
In thiscasetheEuclideandistanceismostoftenuseddueto itsrotation
invariance. Itleadsto (hyper-)sphericalclusters.
However, moreflexible clusteringapproaches(withsizeand shape
parameters) usecluster-specificdistance functions.
Themostcommonapproachisto usea Mahalanobisdistance witha
cluster-specificcovariancematrix.
Thecovariancematrixcomprisesshapeand sizeparameters.
TheEuclideandistance isa specialcasethatresultsfor
8
S
N
F
EURO
UZZY
Interpretation of a CovarianceMatrix
A univariatenormal distributionhas thedensityfunction
A multivariatenormal distributionhas thedensityfunction
9
S
N
F
EURO
UZZY
Varianceand Standard Deviation

UnivariateNormal/GaussianDistribution
Thevariance/standarddeviationprovidesinformationabouttheheightof
themode and thewidthof thecurve.
10
S
N
F
EURO
UZZY
Interpretation of a CovarianceMatrix
Thevariance/standarddeviationrelatesthespreadof thedistributionto
thespreadof a standardnormal distribution
Thecovariancematrixrelatesthespreadof thedistributionto thespread
of a multivariatestandardnormal distribution
Example: bivariatenormal distribution
Question:Istherea multivariateanalog of standarddeviation?
11
S
N
F
EURO
UZZY
CholeskyDecomposition

Intuitively: Computean analog of standarddeviation.

LetSbea symmetric, positive definite matrix(e.g. a covariancematrix).
Choleskydecompositionservesthepurposeto computea “squareroot”
of S.
12
S
N
F
EURO
UZZY
CholeskyDecomposition
13
S
N
F
EURO
UZZY
EigenvalueDecomposition

Also yieldsan analog of standarddeviation.

ComputationallymoreexpensivethanCholeskydecomposition.

LetSbea symmetric, positive definite matrix(e.g. a covariancematrix).
14
S
N
F
EURO
UZZY
EigenvalueDecomposition
Special Case: TwoDimensions
15
S
N
F
EURO
UZZY
EigenvalueDecomposition
16
S
N
F
EURO
UZZY
EigenvalueDecomposition
17
S
N
F
EURO
UZZY
EigenvalueDecomposition
Special Case: TwoDimensions
18
S
N
F
EURO
UZZY
Prototype Properties
19
S
N
F
EURO
UZZY
Radial Functions
Themembershipof a datapoint to a clusterisspecifiedbya radial function
whichisappliedto thedistance of thedatapoint to theclusterprototype.
20
S
N
F
EURO
UZZY
Radial Functions
Illustration of theinfluenceof theparameteraon theshapeof theradial function.
Note: all functionshavethesamevalueat r= 1 (independent of a).
Thismakesitpossibleto definea “referenceradius”.
TheGaussianfunctionsisgenerallysteeperthantheCauchyfunction,
whichleadsto a moreconvenientlimitingbehaviorin clustering.
21
S
N
F
EURO
UZZY
Radial Functions
22
S
N
F
EURO
UZZY
Radial Functions: Normalization
23
S
N
F
EURO
UZZY
MembershipNormalization
24
S
N
F
EURO
UZZY
ObjectiveFunctions
Most prototype-basedclusteringalgorithmsrelyon an objective
functionthatdefinestheclusteringobjectiveand isto beoptimized.
Common principlesunderlyingobjectivefunctions:
Distance-basedClustering
Determinetheclusterparametersin such a way thatthesumof the
distancesto theclusterprototypesisminimized(distancesmaybe
weightedwiththememberships).
Probability-basedClustering
Determinetheclusterparametersin such a way thattheprobability
of thegivendatasetgiventheclustermodelismaximized
(maximum(log-)likelihoodapproach).
Sincetheprobabilitydensityfunctionsareusuallyradial functions
definedon distances, thetwoapproachesareverysimilar.
25
S
N
F
EURO
UZZY
FuzzyClustering
26
S
N
F
EURO
UZZY
FuzzyClustering: AlternatingOptimization
TheobjectivefunctionJcannotbeminimizeddirectly.
Therefore: AlternatingOptimization
(verysimilarin style to theexpectationmaximizationalgorithm)
Optimizemembershipdegreesforfixedclusterparameters.
Optimizeclusterparametersforfixedmembershipdegrees.
(Update formulaearederivedbydifferentiatingtheobjective
functionJ.)
Iterateuntilconvergence(checked, e.g., bychangeof clustercenter).
Update of MembershipDegrees:
27
S
N
F
EURO
UZZY
Standard FuzzyClusteringAlgorithms
28
S
N
F
EURO
UZZY
SizeParameters in FuzzyClusteringAlgorithms
(i.e., simplyremovethenormalizationto a unitdeterminant).
Such an approachisindeedfeasibleand leadsto good resultsin practicaltests.

Drawback:Thisupdate rulecannotbederivedfromtheobjectivefunction.
Reason: Thelarger thedeterminantof thecovariancematrix, thesmaller
thedistance of thedatapointsto theclustercenter.
→Theobjectivefunctionhas no minimum.
29
S
N
F
EURO
UZZY
SizeParameters in FuzzyClusteringAlgorithms
30
S
N
F
EURO
UZZY
(Fuzzy) LearningVector Quantization

Competitivelearningalgorithmforartificialneuralnetworks.

Applicableto classifiedas well as unclassifieddata.
Here: restrictionto unsupervisedlearning, i.e. clustering

General idea: Iterativelyupdate a setof c so-calledreferencevectors.
31
S
N
F
EURO
UZZY
Sizeand Shapein (Fuzzy) LearningVector Quantization

Introducea covariancematrixforeachreferencevectorto describesize
and shapeinformation.

In order to derivean update rule, reconsiderthereferencevectors:
32
S
N
F
EURO
UZZY
(Fuzzy) LearningVector Quantization

Problem: Updatingall parametersaftereachdatapoint istooexpensive.
Solution: Bundletheupdate forseveralpatterns.

Problem: Differingweightsumsmakeitdifficultto choosea learning
rate.
Solution: Exploitsimilarityof LVQ to fuzzyclustering.
Update a referencevectoraccordingto
33
S
N
F
EURO
UZZY
MixtureModels
34
S
N
F
EURO
UZZY
Mixtureof Gaussians: ExpectationMaximization
35
S
N
F
EURO
UZZY
FuzzyMaximum LikelihoodEstimation
36
S
N
F
EURO
UZZY
ShapeConstraints
37
S
N
F
EURO
UZZY
ShapeConstraints
38
S
N
F
EURO
UZZY
SizeConstraints
39
S
N
F
EURO
UZZY
WeightConstraints
40
S
N
F
EURO
UZZY
ShapeConstraints: Iris Data
41
S
N
F
EURO
UZZY
SizeConstraints: WineData
42
S
N
F
EURO
UZZY
ClusteringDocumentCollections: Vector SpaceModel
43
S
N
F
EURO
UZZY
ClusteringDocumentCollections: MeasuringSimilarity
Eachdocumentisrepresentedbya numericvectorof unitlength.
Thesimilarityof twodocumentscanbecomputedas thescalarproduct
of thetwovectorsrepresentingthedocuments, i.e.,
Note:fornormalizedvectors(unitlength) thescalarproductof two
vectorsis
n
otmuchdifferent in behaviorfromthe(squared) Euclidean
distance, since
Consequently, wemayjust as well workwithEuclideandistances, which
has theadvantagethattheintroductionof shapeand sizeparametersis
easier(Mahalanobisdistance).
44
S
N
F
EURO
UZZY
ClusteringDocumentCollections: Index Term Selection
Usingall possibleindextermsleadsto extremelylarge vectorspaces.
Therefore: selectkeywordsbasedon theirentropy, whichisdefinedas
Theentropymeasureshowwell a termissuitedto separate documents.
Greedyprocedureof keywordselection:
(additional objective: achievea good coverageof thedocuments)
Repeat: Go to thenextunmarkeddocument, selectthetermwiththehighest
relative entropy, and markall documentscontainingthisterm.
Whenall documentsaremarked, unmarkall documentsand start over.
Stop whena user-definednumberof termshavebeenselected.
Experience:documentcoverageismoreimportantthanhigh entropy.
45
S
N
F
EURO
UZZY
Experiments: DataSet

http://www.pedal.rdg.ac.uk/banksearchdataset/index.htm
Collectionof about11,000 web pages(11 categories, 4 major themes)
46
S
N
F
EURO
UZZY
Experiments: KeywordSelectionetc.
After stemming(Porter stemmer) and stopwordfiltering: 163,860 words
Removetermsthat
areshorterthan4 characters,
occurin lessthan15 documents,
occurin morethan11,000/12 ≈917 documents.
Resultingwordsetcontains10,626 words.
Applythedescribedgreedyindextermselectionschemeto select400
words.
Clustering(hardc-means, fuzzyc-means, vectorquantization) was then
executedon the20, 50, 100, . . . , 400 mostfrequentwordsin this
subset.
Clusteringtasksas in [Sinkaand Corne2002].
Test system: Pentium 4C 2.6 GHz with1 GB of mainmemory
S.u.S.E. Linux9.1, gccversion3.3.3
47
S
N
F
EURO
UZZY
Commercial Banks versusSoccer
toprow: normalizedcenters, fixeduniform variances
bottomrow: freecenters, adaptablevariances
black: reclassificationaccuracy, grey: executiontime
48
S
N
F
EURO
UZZY
BuildingCompaniesversusInsuranceAgencies
toprow: normalizedcenters, fixeduniform variances
bottomrow: freecenters, adaptablevariances
black: reclassificationaccuracy, grey: executiontime
49
S
N
F
EURO
UZZY
All FourMajor Themes(fourclusters)
toprow: normalizedcenters, fixeduniform variances
bottomrow: freecenters, adaptablevariances
black: reclassificationaccuracy, grey: executiontime
50
S
N
F
EURO
UZZY
Conclusions
Shapeconstraintshelpsto avoiddegenerateclustershapes
(i.e. verylongand thin(hyper-)ellipsoids).
Sizeconstraintshelpsto avoidcollapsingclusters
(i.e. thecontractionof a clusterto a point).
Weightconstraintshelpsto avoidtoounevenlypopulatedclusters.
Consequence:considerablyimprovedbehaviorof clusteringalgorithms
(morerobust, lesssensitive to initialization).
Experience: in somedomainsclusteringwithadaptableshapeand size
isalmostimpossiblewithoutconstraints.
A clusteringprogramwrittenin C thatcontainsall describedmethodscanbe
retrievedfreeof charge(distributedundertheLGPL) at
http://fuzzy.cs.uni-magdeburg.de