Supervised vs. unsupervised learning

achoohomelessAI and Robotics

Oct 14, 2013 (3 years and 10 months ago)

126 views

MachineLearningforNLP:Unsupervised
learningtechniques
SaturninoLuz
Dept.ofComputerScience,TrinityCollegeDublin,Ireland
ESSLLI'07

Dublin

Ireland
Supervisedvs.unsupervisedlearning
•Sofarwehaveseensupervisedlearning(of
classification):
–learningbasedonatrainingsetwherelabellingof
instancesrepresentsthetarget(categorisation)
function
–classifierimplementsanapproximationofthe
targetfuntion
–outcome:aclassificationdecision
•Unsupervisedlearning:
–learningbasedonunannotatedinstances;
–outcome:agroupingofobjects(instancesand
groupsofinstances)
2
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Applications
•Exploratorydataanalysis(datamining):clusteringcan
revealpatternsofassociationinthedata
•Informationvisualisation:naturalwaysofdisplaying
associationpatterns
–dendrograms,self-organisingmapsetc
•Informationretrieval:keyword
[
SparckJonesandJackson
,
1970
]anddocument
[
vanRijsbergen
,
1979
]clustering.
•Improvinglanguagemodels
•Corpusanalysis(homogeneity)
•Objectandcharacterrecognition
•Dimensionalityreductionbytermextractionintext
categorisation
3
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
3-1
Datarepresentation
•Asbefore,vector-basedrepresentationisapopular
choice.E.g.:
lectureweexaminedclusteringgroups
...
lecture
=h
2,2,1,2,0
,...i
we
=h
2,2,1,2,0
,...i
examined
=h
1,1,1,2,0
,...i
clustering
=h
2,2,1,3,1
,...i
groups
=h
0,0,0,1,1
,...i
.
.
.
.
.
.
Figure1:Co-occurrencevectorrepresentationforwords
4
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
4-1
Typesofunsupervisedlearning
•Clusteringalgorithmsarethemaintechniquefor
unsupervisedlearning;
•Ataxonomy[
Jainetal.
,
1999
]:
–Partitionalclustering:
∗k-means,ExpectationMaximisation(EM),Graph
theoretic,modeseeking
–hierarchical:
∗single-link
∗complete-link
∗average-link
–Agglomerativevs.divisive
5
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
5-1
Distanceanddissimilaritymeasures
•Giveninstances
a,b
and
c
representedasreal-valued
vectors,adistancebetween
a
and
b
isafunction
d(a,b)
satisfying:
d(a,b)≥0
(1)
d(a,a)=0
(2)
d(a,b)=d(b,a)
(3)
d(a,b)≤d(a,c)+d(b,c)
(4)
•When(
4
)doesn’thold,
d
iscalledadissimilarity
•Euclideandistance,
d(~x,~y)=
q
P
|~x|
i=1
(x
i
−y
i
)
2
is
commonlyused.
6
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
6-1
Hierarchicalclustering
•Input:objectsrepresentedasvectors
•Output:ahierarchyofassociationsrepresentedasa
“dendrogram”
a
b
c
d
e
f
g
02468
0246810
x0
x
1
f
g
d
e
a
b
c
012345
dissimilarity
(IfyouknowR,seehclusters.Rinronaldo.cs.tcd.ie/esslli07/practicals/)
7
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
7-1
Asimpleagglomerativeclusteringalgorithm
Algorithm1:Simpleagglomerativehierarchicalclustering
1hclust(
D:
setofinstances):tree
2var:
C
,/*setofclusters*/
3
M
/*matrixcontainingdistancesbetween*/
4/*pairsofclusters*/
5foreach
d∈D}
do
6make
d
aleafnodein
C
7done
8foreachpair
a,b∈C
do
9
M
a,b
←d(a,b)
10done
11while(notallinstancesinonecluster)do
12Findthemostsimilarpairofclustersin
M
13Mergethesetwoclustersintoonecluster.
14Update
M
toreflectthemergeoperation.
15done
16return
C
8
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
8-1
Similarity
•Resultsvarydependingonhowyoudefinesimilarity.
•Thedefinitiondeterminethetypeofclusteringalgorithm:
–InSingle-linkclustering,similarityisdefinedastheminimum
distancebetweenanytwopairsofinstances:
sim
s
(c
1
,c
2
)=
1
1+min
x
1
∈c
1
,x
2
∈c
2
d(x
1
,x
2
)
(5)
–Incomplete-link,asthemaximumdistancebetweenanytwopairs
ofinstances:
sim
c
(c
1
,c
2
)=
1
1+max
x
1
∈c
1
,x
2
∈c
2
d(x
1
,x
2
)
(6)
–andinaverage-link,asthemeandistance:
sim
a
(c
1
,c
2
)=
1
1+
1
|c
1
||c
2
|
P
x
1
∈c
1
P
x
2
∈c
2
d(x
1
,x
2
)
(7)
9
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
9-1
Howdothedifferentdefinitionsaffect
clustering?
•Single-linktendtoproduce“straggly”orelongated
clusterswhereascomplete-linktendtoproducemore
compactgroups[
ManningandSch¨utze
,
1999
]:
01234
0
1
2
3
￿
d5
￿d6
￿
d7
￿
d8
￿
d1
￿
d2
￿
d3
￿
d4
Singlelink
01234
0
1
2
3
￿
d5
￿
d6
￿
d7
￿
d8
￿
d1
￿
d2
￿
d3
￿
d4
Complete-link
10
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
10-1
Whyareelongatedclusterssometimesabad
thing?
•noisedatainthevicinityofclustersmightleadto
incorrectmerging:
B
A A A A A A A A
A
A
A
A
A
A
A
B B B B BB
B
B
B
B B
B
* * * * * * * * *
B
B
B
11
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
11-1
Examples:clusteringofRCV1documents
•Dendrogramforsingle-linkclusteringof30RCV1
documents(Manning,Raghavan&Sch¨ultze,inpress):
1.00.80.60.40.20.0
Ag trade reform.
Back￿to￿school spending is up
Lloyd’s CEO questioned
Lloyd’s chief / U.S. grilling
Viag stays positive
Chrysler / Latin America
Ohio Blue Cross
Japanese prime minister / Mexico
CompuServe reports loss
Sprint / Internet access service
Planet Hollywood
Trocadero: tripling of revenues
German unions split
War hero Colin Powell
War hero Colin Powell
Oil prices slip
Chains may raise prices
Clinton signs law
Lawsuit against tobacco companies
suits against tobacco firms
Indiana tobacco lawsuit
Most active stocks
Mexican markets
Hog prices tumble
NYSE closing averages
British FTSE index
Fed holds interest rates steady
Fed to keep interest rates steady
Fed keeps interest rates steady
Fed keeps interest rates steady
12
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
12-1
Examples:clusteringofRCV1documents
•Dendrogramforcomplete-linkclusteringof30RCV1
documents:
1.00.80.60.40.20.0
NYSE closing averages
Hog prices tumble
Oil prices slip
Ag trade reform.
Chrysler / Latin America
Japanese prime minister / Mexico
Fed holds interest rates steady
Fed to keep interest rates steady
Fed keeps interest rates steady
Fed keeps interest rates steady
Mexican markets
British FTSE index
War hero Colin Powell
War hero Colin Powell
Lloyd’s CEO questioned
Lloyd’s chief / U.S. grilling
Ohio Blue Cross
Lawsuit against tobacco companies
suits against tobacco firms
Indiana tobacco lawsuit
Viag stays positive
Most active stocks
CompuServe reports loss
Sprint / Internet access service
Planet Hollywood
Trocadero: tripling of revenues
Back￿to￿school spending is up
German unions split
Chains may raise prices
Clinton signs law
13
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
13-1
k-meansclustering
Algorithm2:K-meansclustering
1k-means(
X={
~
d
1
,...,
~
d
n
}⊆R
m
,
k
)
:2
R
2
C:2
R
/*

asetofclusters*/
3
d:R
m
×R
m
→R
/*distancefunction*/}
4
:2
R
→R
/*

computesthemeanofacluster*/}
5select
C
with
k
initialcentres
~
f
1
,...,
~
f
k
6whilestoppingcriterionnottruedo
7forallclusters
c
j
∈C
do
8
c
j
←{
~
d
i
|∀f
l
d(
~
d
i
,f
j
)≤d(
~
d
i
,f
l
)}
9done
10forallmeans
~
f
j
do
11
~
f
j
←(c
j
)
12done
13done
14return
C
14
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
14-1
k-meanscharacteristics
•Needtoselectthenumberofclustersinadvance
•Mightconvergetoalocalminimum
•But...
–itismoreefficient(lowercomputational
complexity)thanhirearchicalclustering
•K-meanscanbeseenasaspecialisationofthe
expectationmaximisation(EM)algorithm
15
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
15-1
AnotherExample:termextractionforTC
Sampleco-occurrencematrixforasubsetof
REUTERS-21578:
usair20201041000102030001011000000021413
voting2100201000000200010001010000000200
buyout008102000000110000030000000101000
stake1216200011001200002010010121000100
santa000073000000010000000000200020010
merger4120348013020124100112000200105431
ownership100000600000000000000000000000100
rospatch000101050000000000000000000000000
rexnord000103005000000000000000100000000
designs000000000500100000000002000011000
pie100002000050004000000000000001201
.
.
.
.
.
.
16
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
16-1
Extractionwithk-means
K-means(
k=5
)clusteringofthewordsinslide
16
Clusterelements
1stake
2usair,merger,twa
3acquisition
4acquire
5voting,buyout,santa,ownership,
rospatch,rexnord,designs,pie,
recommend,definitive,piedmont,
consent,boards,dome,obtain,
leveraged,comply,phoenix,core,
manufactures,midnight,islands,
axp,attractive,undisclosed,
interested,trans
17
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
17-1
Termextractionbyhierarchicalclustering
Hierarchicalclustering(complete-link)ofthewordsinslide
17
stake
acquisition
merger
acquire
undisclosed
usair
twa
voting
pie
piedmont
definitive
buyout
leveraged
trans
ownership
boards
designs
manufactures
recommend
islands
rospatch
obtain
comply
phoenix
attractive
consent
axp
dome
core
santa
interested
rexnord
midnight
020406080
dissimilarity
18
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
18-1
Unsupervisedword-sensedisambiguation
Concordancesfortheword“bank”
RIVybe?Thenherandownalongthebank,towardanarrow,muddypath.
FINfourbundlesofsmallnotesthebankcashiergotitintohishead
RIVrossthebridgeandontheotherbankyouonlyhearthestream,the
RIVbeneaththehouse,whereasteepbankofearthiscompactedbetween
FINopbutisreallythebranchofabank.AsIsetfootinside,despite
FINrafficpolicealsobelongtothebank.Morefoolhardythanentering
FINrequireanumber.Ifyouopenabankaccount,thetelleridentifies
RIVcircularmovement,skirtingthebankoftheRiverJordan,thenturn
..
..
..
19
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
19-1
Samplerunof2-meansclustering
•k-meansclustersthelinesintothefollowinggroups:
1fin1,riv3,fin4,fin6,fin7,fin9,fin10,
riv15,riv16,fin19,fin20,fin22,fin23,
fin24,fin25,fin26,riv27,fin28,fin29,
fin32,fin33,fin34
2riv2,riv5,riv8,riv11,riv12,riv13,
riv14,riv17,riv18,riv21,riv30,riv31,
riv35
20
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
20-1
Hierarchicalclustering(single-link)ofsenses
oftheword“bank”
riv27
fin26 fin20
riv3
fin29 fin28 fin24 fin23
fin7 fin6
riv5
fin33
riv30
fin25 fin22
riv16 riv12
fin10
fin4
fin34
riv15
fin32
riv14
riv11 riv13
riv21 riv17
fin9
riv35 riv18
riv8
riv2
riv31
fin1
fin19
3.63.84.04.24.4
dissimilarity
21
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
21-1
Furthertopics
•Efficientclusteringalgorithms
•Clusterlabelling
•Clusterevaluation
•ExpectationMaximisation(EM)clusteringand
applications
•Clusteringandinformationvisualisation:SOMand
ANNs
22
SaturninoLuz:ESSLLI’07

Dublin

Ireland
Notes
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
......................................................................................................
..................................................................................................
22-1
References
A.K.Jain,M.N.Murty,andP.J.Flynn.Dataclustering:a
review.ACMComputingSurveys,31(3):264–323,1999.URL
citeseer.ist.psu.edu/jain99data.html
.
ChristopherD.ManningandHinrichSch¨utze.Foundationsof
StatisticalNaturalLanguageProcessing.TheMITPress,
Cambridge,Massachusetts,1999.
K.SparckJonesandD.M.Jackson.Theuseof
automatically-obtainedkeywordclassificationsforinformation
retrieval.InformationStorageandRetrieval,5:175–201,1970.
C.J.vanRijsbergen.InformationRetrieval.Butterworths,1979.
URL
http://www.dcs.gla.ac.uk/Keith/
.
23
SaturninoLuz:ESSLLI’07

Dublin

Ireland