Statistical NLP Winter 2009

cabbagecommitteeAI and Robotics

Oct 24, 2013 (3 years and 10 months ago)

71 views

Statistical NLP

Winter
2009

Discourse Processing

Roger Levy, UCSD

Thanks to Dan
Klein, Aria
Haghighi
,
Hannah Rohde, and Andy
Kehler


How do we talk about the world?


Isolated
-
sentence meaning is NOT all there is to
understanding linguistic meaning


There is huge importance in understanding how
sentences in a discourse are linked to one another and
to the world


We’ll crudely lump all that stuff into the term
“discourse”

“Discourse”


Today we’ll cover two different topics in this
connection:


Coreference resolution
:

which referring expressions
refer to the same things


Coherence relations
: what the meaningful relationships
are between sentences in a discourse


N.B.
Andy Kehler (in our department) is one of
THE

world authorities on these topics

Why does discourse matter


There are many theoretically interesting problems that
arise in the study of discourse:


How to represent inferred meanings and how they
change sentence
-
by
-
sentence in a discourse


What information sources people use to resolve
“discourse
-
level” ambiguity


What implications discourse
-
level information has for
processing at other levels


There are also practical applications:


Question Answering (Semantics)


Document Summarization


Automatic Essay Grading

Document Summarization



First Union Corp is continuing to wrestle with severe
problems. According to industry insiders at Paine
Webber, their president, John R. Georgius, is planning
to announce his retirement tomorrow.




First Union President John R. Georgius is planning to
announce his retirement tomorrow.


We’ll start with reference resolution

Reference Resolution


Noun phrases refer to entities in the world, many pairs
of noun phrases co
-
refer:

John Smith, CFO of Prime Corp. since 1986,

saw his pay jump 20% to $1.3 million

as the 57
-
year
-
old also became

the financial services co.’s president.

Kinds of Reference


Referring expressions


John Smith


President Smith


the president


the company’s new executive



Free variables


Smith saw
his pay
increase



Bound variables


The dancer hurt
herself.

More interesting
grammatical
constraints,
more linguistic
theory, easier in
practice

More common in
newswire, generally
harder in practice

Not all NPs are referring!


Every dancer

twisted
her knee
.


(
No dancer

twisted
her knee.
)



There are three NPs in each of these sentences;
because the first one is non
-
referential, the other two
aren’t either.

Grammatical Constraints


Gender / number


Jack gave Mary a gift. She was excited.


Mary gave her mother a gift. She was excited.



Position (cf. binding theory)


The company’s board polices itself / it.


Bob thinks Jack sends email to himself / him.



Direction (anaphora vs. cataphora)


She bought a coat for Amy.


In her closet, Amy found her lost coat.

Discourse Constraints


Recency



Salience



Focus



Centering Theory [Grosz et al. 86]



Coherence Relations (see end)

Other Constraints


Style / Usage Patterns


Peter Watters

was named CEO.
Watters’

promotion
came six weeks after his brother,
Eric Watters
, stepped
down.



Semantic Compatibility


Smith had bought
a used car

that morning.
The used
car dealership

assured him it was in good condition.


Evaluation


B
-
CUBED algorithm for evaluation


Precision & recall for
entities in a reference chain


Precision: % of elements in a hypothesized reference
chain that are in the true reference chain


Recall: % of elements in a true reference chain that are
in the hypothesized reference chain


Overall precision & recall are the (weighted) average of
per
-
chain precision & recall


Optimizing chain
-
chain pairings is a hard (in the
computational sense) problem


Greedy matching is done in practice for evaluation

Two Kinds of Models


Mention Pair models


Treat coreference chains as a
collection of pairwise links


Make independent pairwise decisions
and reconcile them in some way (e.g.
clustering or greedy partitioning)



Entity
-
Mention models


A cleaner, but less studied, approach


Posit single underlying entities


Each mention links to a discourse
entity [Pasula et al. 03], [Luo et al. 04]

Mention Pair Models


Most common machine learning approach


Build classifiers over pairs of NPs


For each NP, pick a preceding NP or NEW


Or, for each NP, choose link or no
-
link


Clean up non transitivity with clustering or graph
partitioning algorithms


E.g.: [Soon et al. 01], [Ng and Cardie 02]


Some work has done the classification and clustering jointly
[McCallum and Wellner 03]


Kind of a hack, results in the 50’s to 60’s on all NPs


Better number on proper names and pronouns


Better numbers if tested on gold entities


Failures are mostly because of insufficient knowledge or
features for hard common noun cases

Pairwise Features


[Luo et al. 04]

An Entity Mention Model


Example: [Luo et al. 04]


Bell Tree (link vs. start
decision list)


Entity centroids, or not?


Not for [Luo et al. 04], see
[Pasula et al. 03]


Some features work on
nearest mention (e.g. recency
and distance)


Others work on “canonical”
mention (e.g. spelling match)


Lots of pruning, model highly
approximate


(Actually ends up being like a
greedy
-
link system in the end)

A Generative Mention Model


[
Haghighi

and Klein 07]


A generative model in which both document
-
level factors and
recency

factors
bias next
-
mention probabilities


We’ll look at this, simple to complex.


Infinite Mixture Model

MUC F
1

The
Weir Group

,
whose

headquarters

is in
the
U.S

is a large specialized
corporation
.
This
power plant

,
which

, will be situated in
Jiangsu
, has a large generation capacity.

Pronouns lumped
into their own
clusters!

Enriching Mention Model


W


Z


PL: 0.6, SING: 0.1, ...

Number

NEUT: 0.7, MALE: 0.1, ...

Gender

Enriching Mention Model

PL: 0.6, SING: 0.1, ...

Number

NEUT: 0.7, MALE: 0.1, ...

Gender

Pronoun


W


Z



T


G


N

Number

Sing,
Plural


Gender

M,F,N



W


Z


Non
-
Pronoun

EntityType

PERS, LOC,
ORG, MISC


Enriching Mention Model




W | SING, MALE, PERS

“he”:
0.5
, “him”:
0.3
,...

W | PL, NEUT, ORG

“they”:0.3, “it”: 0.2,...

Entity Parameters

Pronoun Parameters




Pronoun


W


Z



T


G


N

Enriching Mention Model


W


Z


PL: 0.6, SING: 0.1, ...

Number

NEUT:
0.7
, MALE:
0.1
, ...

Gender

Non
-
Pronoun

Pronoun


W


Z



T


G


N

Enriching Mention Model


M



W


Z



G


N


T

Pronoun

Non
-
pronoun



Mention Type

Proper, Pronoun,

Nominal




Enriching Mention Model

W


W


Z


Z


.....

.....

Enriching Mention Model

Z


W


Z


W


G


N


T


G


N


T


M


M


.....

.....

Pronoun Head Model

MUC F
1

The
Weir Group

,
whose

headquarters

is in
the
U.S

is a large specialized
corporation
.
This
power plant

,
which

, will be situated in
Jiangsu
, has a large generation capacity.

Should be
coreferent

with
recent “power
plant” entity.

Salience Model

Entity

Activation

1

1.0

2

0.0

Salience Values

TOP, HIGH, MED,
LOW, NONE



Z



L



M



S


Mention Type

Proper, Pronoun,
Nominal


Salience Model


Z
1



M
1



L
1



S
1


Entity

Activation

1

0.0

2

0.0

Ent

1

NONE

PROPER

Salience Model


Z
1



M
1



L
1



S
1


NONE

Entity

Activation

1

1.0

2

0.0

PROPER

Ent

2


Z
2



M
2



L
2



S
2


Entity

Activation

1

0.0

2

0.0

Salience Model

TOP

Entity

Activation

1

0.5

2

1.0

Ent

2

PRONOUN


Z
1



M
1



L
1



S
1



Z
2



M
2



L
2



S
2



Z
3



M
3



L
3



S
3


Entity

Activation

1

1.0

2

0.0

Entity

Activation

1

0.0

2

0.0

Enriching Mention Model

Z


W


Z


W


G


N


T


G


N


T


M


M


.....

.....

S


L


S


L


Enriching Mention Model

Z


W


Z


W


G


N


T


G


N


T


M


M


.....

.....

S


L


S


L


Salience Model

MUC F
1

The
Weir Group

,
whose

headquarters

is in
the
U.S

is a large specialized
corporation
.
This
power plant

,
which

, will be situated in
Jiangsu
, has a large generation capacity.

Salience Model

Global
Coreference

Resolution

Global Entities

HDP Model

Z


W


Z


W


G


N


T


G


N


T


M


M


.....

.....

S


L


S


L


HDP Model




N

HDP Model

Z


W


Z


W


G


N


T


G


N


T


M


M


...
..

...
..

S


L


S


L


HDP Model




Global Entity
Distribution
drawn from a
DP




N

Z


W


Z


W


G


N


T


G


N


T


M


M


...
..

...
..

S


L


S


L


Document Entity
Distribution
subsampled


from Global Distr.

HDP Model

MUC F
1

The
Weir Group

,
whose

headquarters

is in
the
U.S

is a large specialized
corporation
.
This
power plant

,
which

, will be situated in
Jiangsu
, has a large generation capacity.

Coherence relations


Adjacent sentence pairs come in multiple flavors:



John hid Bill’s car keys.


He was drunk. [Explanation; He=Bill?]


He was mad. [“Occasion” ; He=Bill?]


??He likes spinach.

Implications of discourse processing


What we’ll show for the rest of today is work by
Hannah Rohde, joint with me & Andy Kehler, on the
implications of coherence relations & their processing
for
syntactic

disambiguation


This is psycholinguistics, but it points to the need for
very sophisticated models in NLP


Starting point: Winograd 1972:


The city council denied the demonstrators a permit
because…


they feared violence.


they advocated violence.

43

Relative clause attachment ambiguity

(1) Someone shot who was on the balcony.


Previous work on RC attachment suggests that low
attachment in English is preferred
(Cuetos & Mitchell
1988;
Frazier & Clifton 1996; Carreiras & Clifton 1999; Fernandez,
2003; but see also Traxler, Pickering, & Clifton, 1998)


RC attachment is primarily analyzed in consideration
of syntactically
-
driven biases.

the servant

the actress

of

the servant

HIGH

the actress

LOW

44

Discourse biases in RC processing


Previous work: discourse context is referential context


RC pragmatic function is to modify or restrict identity of referent


RC attaches to host with more than one referent (Desmet et al. 2002,
Zagar et al. 1997, Papadopoulou & Clahsen 2006)

(3) There were
two servants

working for a famous actress.


Someone shot
the servant

of the actress who was on the balcony.

(2) There was a servant who was working for
two actresses
.


Someone shot the servant of
the actress

who was on the balcony.

45

A different type of discourse bias


Observation #1: RCs can also provide an explanation


(4) The boss fired the employee who always showed up late.


(cancelable) implicature that employee’s lateness


is the explanation for the boss’ firing

(5)

“Atlanta Car Dealer Murdered 2 Employees


Because They Kept Asking for Raises”
[article headline]


(6)
“Boss Killed 2 Employees Who Kept Asking for Raises”




[abbreviated news summary headline]

AP News headlines with explanation
-
providing RC

46

Biases from implicit causality verbs


Observation #2: IC verbs are biased to explanations


in story continuations,
IC verbs yield more explanations


than NonIC verbs (Kehler, Kertz, Rohde, Elman 2008)

(7) IC: John detests Mary. ________________________.

(8) NonIC: John babysits Mary. ______________________.


Observation #3: w/explanation, IC have next
-
mention bias



in sentence completions, IC verbs like
detest
yield more


object next mentions

(
Caramazza, Grober, Garvey, Yates
1974
;


Brown & Fish
1983
; Au
1986
; McKoon, Greene, Ratcliff
1993
;
inter alia
)

(9) IC: John detests Mary because ________________.

(10) NonIC: John babysits Mary because _______________.

she is arrogant

OBJ

he/she/they …

She is arrogant and rude

Mary’s mother is grateful

47

Proposal: IC biases in RC attachment

#1 Relative clauses can provide explanations

#2 IC verbs create an expectation for an upcoming
explanation

#3 Certain IC verbs have a next
-
mention bias to the object


(11) NonIC: John babysits the children of the musician who …









(12) IC: John detests the children of the musician who …






Null Hypothesis: Verb type will have no effect on attachment





(a) is a singer at the club downtown.

(low)


(b) are arrogant and rude.


(high)

(a) is a singer at the club downtown.

(low)

(b) are students at a private school.

(high)

expected




unexpected

expected



Discourse Hypothesis: IC verbs will increase comprehenders’






expectations for a high
-
attaching RC

48

Sentence completion study


-

Web
-
based experiment


-

52 monolingual English
-
speaking UCSD undergrads


-

Instructed to write a natural completion


-

2 judges annotated responses:


-

RC function: ‘only restrict’ vs. ‘restrict AND explain’


-

RC attachment:
‘high’
vs.
‘low’


-

Analysis only on trials with unanimous judge agreement


IC: John detests the children of the musician who …


NonIC: John babysits the children of the musician who …

49

Completion results: RC function

*

More explanation
-
providing RCs following IC than Non
-
IC


IC: John detests the children of the musician who …


NonIC: John babysits the children of the musician who …

50

Completion results: attachment

*

More high
-
attaching RCs following IC verbs than NonIC


IC: John detests the children of the musician who …


NonIC: John babysits the children of the musician who …

51


Evidence that expectations about upcoming
explanation
-
providing RCs influence RC attachment


Evidence in support of the discourse hypothesis


Null Hypothesis

:

low attachments across the board


Discourse Hypothesis:

more high
-
attaching RCs following IC






verbs than NonIC verbs

Summary: sentence completion





Question: are people using these discourse
-
level






expectations in their online processing?





52

Online processing


For online effects to emerge, comprehenders must be
implicitly aware that:

#1 Relative clauses can provide explanations

#2 IC verbs create an expectation for an upcoming
explanation

#3 Certain IC verbs have a next
-
mention bias to the object




combine these discourse
-
level biases and expectations


to make an online syntactic decision

53

Online reading study


Null Hypothesis: main effect of attachment height



-

low
-
attaching RCs easier to process than high
-
attaching RCs


Discourse Hypothesis: verbtype x attachment interaction



-

high
-
attaching RCs easier in IC condition than in NonIC condition



-

low
-
attaching RCs harder in IC condition than in NonIC condition


IC: John detests the children of the musician who …










NonIC: John babysits the children of the musician who …







(low) is generally arrogant and rude.






(low) is generally arrogant and rude.



(high) are generally arrogant and rude.




(high) are generally arrogant and rude.

54

Reading time study


58
monolingual English
-
speaking UCSD undergrads


DMDX self
-
paced moving
-
window software


Press button to reveal words & answer questions


Analyses:


Reading time


Comprehension
-
question accuracy

55


IC.low: detests the children of the musician who
is generally arrogant



IC.high: detests the children of the musician who
are generally arrogant


NonIC.low: babysits the children of the musician who

is generally arrogant


NonIC.high: babysits the children of the musician who
are generally arrogant


Online results: residual reading times

56

1st spillover region

-

No main effects

-

Crossover


Interaction: p<0.03

Online results: critical region

Spillover_1:



IC


NonIC

high

high

low

low


IC.low: detests the children of the musician who
is generally arrogant



IC.high: detests the children of the musician who
are generally arrogant


NonIC.low: babysits the children of the musician who

is generally arrogant


NonIC.high: babysits the children of the musician who
are generally arrogant


57

Summary: online reading


Online results are consistent with offline results: bias to
high attachments emerges following IC verbs



As predicted, high
-
attaching RCs were read faster than
low
-
attaching RCs in the IC condition, while the reverse
was true in the NonIC condition


Crossover interaction



Effects persist in comprehension
-
question accuracy


Significant crossover interaction by subjects


Low
-
attaching RCs in IC condition yielded worst accuracy

58

Summary & Conclusions


3 Observations

#1 Relative clauses can provide explanations

#2 IC verbs create an expectation for an upcoming
explanation

#3 Certain IC verbs have a next
-
mention bias to the object



Do people use discourse
-
level expectations and biases
as they resolve local syntactic ambiguity?



-

YES, in RC processing



-

Where else might comprehenders be using




discourse
-
level expectations…?



Models of sentence processing need to incorporate
these types of discourse
-
driven expectations.