Clustering Word Senses

savagelizardAI and Robotics

Nov 25, 2013 (3 years and 8 months ago)

52 views

Clustering Word Senses


Eneko Agirre, Oier Lopez de Lacalle




IxA NLP group

http://ixa.si.ehu.es


Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

2

Introduction: motivation


Desired grained of word sense distinctions controversial


Fine
-
grainedness of word senses unnecessary for some
applications


MT: channel (tv, strait)


kanal


Senseval
-
2 WSD competition also provides coarse
-
grained
senses


The desired sense groupings depend on the application:


MT: same translation (language pair dependant)


IR: some related senses: metonymic, diathesis, specialization


Dialogue (deeper NLP): in principle, all word senses in order to
do proper inferences


WSD needs to be tuned, multiple senses returned


Clustering of word senses

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

3

Introduction: a sample word


Channel has 7 senses and 4 coarse
-
grained senses
(Senseval 2)


M
nemo.

Channel definitions

water

4.
channel

--
(a deep and relatively narrow body of water
that allo
ws the best
passage for vessels
)

passage

2.
channel

--
(a passage for water (or
other fluids) to flow through;
)

body

6. duct, epithelial duct, canal,
channel

--
(a bodily passage or tube lined with
epithelial cells and conveying
a secretion or other substance
)

groove

3. groove,
channel

--
(a long narrow furrow cut either by a natural process (such
as erosion) or by a tool
)

tv

7.
channel
, television channel, TV channel
--
(a television station and its
programs
)

signals

1.
channel
, tra
nsmission channel
--
(a path over which electrical signals can pass;
)

comms

5.
channel
, communication channel, line
--
((often plural) a means of
communication or access
)

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

4

Introduction


Work presented here: Test quality of 4 clustering
methods


2 based on distributional similarity


Confusion matrix of Senseval
-
2 systems


Translation equivalencies


Result: hierarchical cluster


Clustering algorithms: CLUTO toolkit


Evaluation: Senseval
-
2 coarse
-
grained senses


Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

5

Clustering toolkit used


CLUTO (Karypis 2001)


Possible inputs:


context vector for each word sense (corpora)


similarity matrix (built from any source)


Number of clustering parameters


Output:


hierarchical or flat clusters

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

6

Distributional similarity methods


Hypothesis
: two word senses are similar if
they are used in similar contexts

1.
Clustering directly over the examples

2.
Clustering over similarity among topic
signatures

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

7

1.
Clustering directly from examples

1.
Take examples from tagged data (Senseval 2)
OR

retrieve sense
-
examples from the web


E.g. if we want examples of first sense of channel
use examples of monosemous synonym:
transmision channel


We use: synonyms, hypernyms, all hyponyms,
siblings


1000 snippets for each monosemous term from
Google


Resource freely available (contact us)

2.
Cluster the examples as if they were
documents

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

8

2. Clustering over similarity among TS

1.
Retrieve the examples

2.
Build topic signatures: vector of words in
context of word sense, with high weights for
distinguished words:

1. sense: channel, transmission_channel

"a path over which
electrical signals can pass "

medium
(3110.34)
optic
(2790.34)
transmission
(2547.13)
electronic
(1553.85)
channel
(1352.44)
mass
(1191.12)
fiber
(1070.28)
public
(831.41)
fibre
(716.95)
communication
(631.38)
technology
(368.66)
system
(363.39)
datum
(308.50) ...

3.
Build similarity matrix of TS

4.
Cluster


Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

9

3. Confusion matrix method


Hypothesis
: sense A is similar to sense B if many WSD
algorithms tag occurrences of A as B


Implemented using results from all Senseval
-
2 systems


4. Translation similarity method


Hypothesis
: two word senses are similar if they are
translated in the same way in a number of languages


(Resnik & Yarowsky, 2000)


Similarity matrix kindly provided by Chugur & Gonzalo
(2002)

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

10

Experiment and results: by method


Best results for distributional similarity:


Topic signatures from web data

Method



purity



Random



0.748



Confusion Matrixes



0.768



Multilingual Similarity



0.799



TS

Sens
eval




(Worse)





(Best)



0.744



0.
806



TS

Web



(Worse)





(Best)



0.
764



0.840









Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

11

word by
word

noun

#s
enses

#clusters

#senseval

#web

purity

art

4

2

275

23391

0.750

authority

7

4

262

108560

0.571

bar

13

10

360

75792

0.769

bum

4

3

115

25655

1.000

chair

4

2

201

38057

0.750

channel

7

4

181

46493

0.714

child

4

2

189

70416

0.750

circuit

6

4

247

3
3754

0.833

day

9

5

427

223899

1.000

facility

5

2

172

17878

1.000

fatigue

4

3

116

8596

1.000

feeling

6

4

153

14569

1.000

hearth

3

2

93

10813

0.667

mouth

8

5

171

1585

0.833

nation

4

2

101

1149

1.000

nature

5

3

137

44242

0.600

post

8

5

201

5
5049

0.625

restraint

6

4

134

49905

0.667

sense

5

4

158

13688

0.800

stress

5

3

112

14528

0.800


Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

12

Conclusions


Meaningful hierarchical clusters


For all WordNet nominal synsets (soon)


Using Web data and distributional similarity


All data freely available (MEANING)

But...


Are the clusters useful for the detection of relations
(homonymy, metonymy, metaphor, ...) among word
senses? Which clusters?


Are the clusters useful for applications?


WSD (ongoing work)


MT, IR, CLIR, Dialogue


Which clusters?



Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

13





Thank you!

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

14

An example of a Topic signature


http://ixa3.si.ehu.es/cgi
-
bin/signatureak/signaturecgi.cgi

Source: web examples using monosemous relatives


1. sense: channel, transmission_channel

"a path over which electrical signals
can pass "

medium
(3110.34)
optic
(2790.34)
transmission
(2547.13)
electronic
(1553.85)
channel
(1352.44)
mass
(1191.12)
fiber
(1070.28)
public
(831.41)
fibre
(716.95)
communication
(631.38)
technology
(368.66)
system
(363.39)
datum
(308.50)



5. sense: channel, communication_channel, line

"(often plural) a means of
communication or access; "

service
(3360.26)
postal
(2503.25)
communication
(1868.81)
mail
(1402.33)
communicate
(1086.16)
us
(651.30)
channel
(479.36)
communicating
(340.82)
united
(196.55)
protocol
(170.02)
music
(165.93)
london
(162.61)
drama
(160.95)



7. sense: channel, television_channel, TV_channel

"a television station and its
programs; "

station
(24288.54)
television
(13759.75)
tv
(13226.62)
broadcast
(1773.82)
local
(1115.18)
radio
(646.33)
newspaper
(333.57)
affiliated
(301.73)
programming
(283.02)
pb
(257.88)
own
(233.25)
independent
(230.88)





Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

15

Experiment and results: an example


Sample cluster built for channel:








Entropy: 0.286, Purity: 0.714.


Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

16

1. Clustering directly from examples:

Retrieving sense
-
examples from the web


Examples of word senses scarce


Alternative, automatically acquire examples from
corpora (or web)


In this paper we follow the monosemous relative method
(Leacock et al.1998)


E.g. if we want examples of first sense of channel use
examples of monosemous synonym: transmision channel


We use: synonyms, hypernyms, all hyponyms, siblings


1000 snippets for each monosemous term from Google


Heuristics to extract partial or full meaningful sentences


More details of the method in (Agirre et al. 2001)

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

17

2. Clustering over similarity among TS

Building topic signatures


Given a set of examples for each word sense


... build a vector for each word sense: each word in the
vocabulary is a dimension


Steps:

1.
Get frequencies for each word in context

2.
Use

2
to assign weight to each word/dimension in
contrast to the other word senses

3.
Filtering step



More details of the method in (Agirre et al. 2001)


Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

18

3. Confusion matrix method


Hypothesis
: sense A is similar to sense B if WSD
algorithms tag occurrences of A as B


Implemented using results from all Senseval
-
2 systems


Algorithm to produce similarity matrix:


M = number of systems


N(x) = number of occurrences of word sense x


n(a,b) = number of times sense a is tagged as b


confusion
-
similarity(a,b) = n(a,b) / N(a) * M


Not symmetric



Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

19

4. Translation similarity method


Hypothesis
: two word senses are similar if they are
translated in the same way in a number of languages


(Resnik & Yarowsky, 2000)


Similarity matrix kindly provided by Chugur & Gonzalo
(2002)


Simplified algorithm:


L = languages (= 4)


n(a,b) = number of languages where a and b share a
translation


similarity(a,b) = n(a,b)/L


Actual formula is more elaborate

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

20

Previous work on WordNet clustering

Use of WordNet structure:


Peters et al. 1998: WordNet hierarchy, try to identify systematic
polysemy


Tomuro 2001: WordNet hierarchy (MDL), try to identify systematic
polysemy (60% precision against WordNet cousins, increase in
inter
-
tagger agreement)


Our proposal does not look for systematic polysemy.

We get individual relations among word senses:


e.g. television channel and transmission channel



Mihalcea & Moldovan 2001: heuristics on WordNet, WSD
improvement (Polysemy reduction 26%, error 2.1% in Semcor)


Provide complementary information

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

21

Previous work (continued)


Resnik & Yarowsky 2000: (also Chugur & Gonzalo (2002).
Translations across different languages, improving evaluation
metrics (very high correlation with Hector sense hierarchies).


We only get 80% purity using (Chugur & Gonzalo). Unfortunately
the dictionaries are rather different (Senseval
-
2 results dropped
compared to Senseval
-
1). Difficult to scale to all words.



Pantel & Lin (2002): induce word senses using soft clustering of
word occurrences (overlap with WordNet over 60% prec.)


Use syntactic dependencies rather than bag
-
of
-
words vector



Palmer et al. (submitted): criteria for grouping verb senses.

Eneko Agirre


IXA NLP group


University of the Basque Country GWNC 2004
-

22