A Machine Learning

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 2 μήνες)

47 εμφανίσεις

A Machine Learning
Approach to Coreference
Resolution of Noun Phrases

By W.M.Soon, H.T.Ng,
D.C.Y.Lim

Presented by Iman Sen

Outline



Introduction


Process Overview


Pipeline Process to find Markables


Feature Selection


The Decision Tree


Results for MUC
-
6, MUC
-
7 & error analysis


Conclusions

Introduction


Coreference for general noun phrases from
unrestricted text.


Learns using the decision tree method from a
small annotated corpus.


First learning based system that performed
comparably with the best non
-
learning systems.


Process Overview


Markables are the union of all the noun phrases, named
entities and nested noun phrases found.


Find markables using a pipeline of NLP modules


Form feature vectors for appropriate pairs of markables.
These are the training examples
.


Train the decision tree classifier on these examples.


For testing, determine pairs of markables in test
document and present to the classifier. Stop after first
successful coreference.


Tokenization & Sentence

Segmentation


Morphological

Processing

Free
Text

POS tagger

NP

Identification

Named Entity

Recognition

Nested Noun

Phrase

Extraction

Semantic

Class

Determination

Markables

Pipelined NLP modules

Standard
HMM
based
tagger

HMM

Based,
uses POS
tags from
previous
module

HMM based,
recognizes

organization,

person,
location, date,
time, money,
percent

2 kinds:
prenominals
such as
((wage)
reduction)
and

possessive
NPs such as
((his) dog).

More on this
in a bit!

Determining the Markables for training

Sentence 1

1.

(Eastern Airlines)
a2

executives notified

(union)
el

leaders that the carrier wishes to


discuss selective

( (wage)
c2
reductions)
d2

on
(Feb. 3)
b2
.

2.
((Eastern Airlines)
5

executives)
6

notified (
(union)
7

leaders)
8
that

(the carrier)
9

wishes to discuss (selective (wage)
10

reductions)
11

on
(Feb. 3)
12
.

Sentence 2

1. (
(Union)
e2

representatives who could be reached)
f1

said
(they)
f2

hadn't
decided whether
(they)
f3

would respond.

2. (
(Union)
13

representatives)
14

who could be reached said
(they)
15

hadn't
decided whether
(they)
16

would respond.




The first version of each sentence is the manual coreference annotation, the
second is the result of the pipeline modules.


The letters in the 1
st

sentence denote coreference chains


We make up pairs (i, j) as training examples


We take only those NPs in a coreference chain where the NP boundaries
match (shown in blue).


Determining the markables for training
continued…

+ve examples

-
ve examples


((union)
7

, (union)
13
)


((the carrier)
9
,(union)
13
)







((wage)
10
,(union)
13
)







((selective wage reductions)
11
,(union)
13
)


((Feb 13)
12
, (union)
13
)


In general, if a1, a2, a3 is a coreference chain correctly
identified, then make up (a1,a2), (a2,a3) as +ve
examples, and for all NPs found in between, say, a2 &
a3, called e, make up

ve examples (e, a3).


Then a feature vector is generated for each pair

Markables for testing


For testing, every antecedent i, before j, is tried.


Start with the immediate preceding i, and go
backwards.


Stop when you find the first +ve coreference.


For nested NPs, we avoid the current markable.


For example, in ((his) daughter), we do not try to
see if “his” corefers to “his daughter”.




Feature Selection

The authors selected the following
12
features:

1)
Distance Feature (DIST):

If (i,j) are in the same sentence then equal to 0,
if one sentence apart, then equal to 1 and so on.

2)
i
-
Pronoun Feature (I_PRONOUN):

Values are true or false. Return true if i
in (i , j) is a pronoun.

3)
j
-
Pronoun Feature (J_PRONOUN):

Tests if j is a pronoun in (i,j).

4)
String Match Feature (STR_MATCH):

Returns true or false. Removes
articles and demonstrative pronouns (such as “that”, “those”, etc) and tests
for a match.

5)
Definite NP Feature (DEF_NP):

If j starts with “the” return true, else false.

6)
Demonstrative Noun Phrase Feature (DEM_NP):

If j starts with “this, that,
these, those” then return true, else false.

7)
Number Agreement Feature (NUMBER):

Morphological root is used to
determine if noun is singular or plural (if not a pronoun), returns true or
false.



Feature Selection continued…

8)
Semantic Class Agreement Feature (SEMCLASS):

returns true, false or
unknown. Classes are “male, female, person, organization, location, date,
time, money, percent, object”. Decided by the semantic module (pick 1
st

sense from WordNet), and is true if same or child of the other. For ex,
male, female are persons, the others are objects. If either is unknown,
compare head nouns, and if same, return true.

9)
Gender Agreement Feature (GENDER):

derive from “Mr.,Mrs.” or “he,
she”. If names not referred to with one of above, then look up database of
common names. Gender of objects is “neutral”. Unknown classes will
have “unknown” gender. Return true is gender matches.

10)
Both Proper Names Feature (PROPER_NAME):

Look at capitalization
and return true or false.

11)
Alias Feature (ALIAS):

return true for aliases. For “persons”, last names
are compared. For “dates”, day, month , year is extracted. For
“organizations”, acronyms are checked.

12)
Appositive Feature (APPOSITIVE):

if j is in apposition to i, return true.
Check for (absence of) verbs and proper punctuation (like “,”).

A Training Example

For each markable pair, a feature vector is derived and this constitutes a
training example.

Sentence:

Separately, Clinton transition officials said that
Frank Newman
,
50,
vice chairman

and chief financial officer of BankAmerica Corp., is expected
to be nominated as assistant Treasury secretary for domestic finance.


Feature vector of the markable pair

(
i =

Frank Newman, j = vice chairman).


DIST 0 i and j are in the same sentence

I_PRONOUN
-

i is not a pronoun

J_PRONOUN
-

j is not a pronoun

STR_MATCH
-

i and j do not match

DEF_NP
-

j is not a definite noun phrase

DEM_NP
-

j is not a demonstrative noun phrase

NUMBER + i and j are both singular

SEMCLASS 1 i and j are both persons (unknown is 2)

GENDER 1 i and j are both males

PROPER_NAME
-

Only i is a proper name

ALIAS
-

j is not an alias of i

APPOSITIVE + j is in apposition to i


The Decision Tree


The decision tree learning algorithm used is C5, an updated
version of C4.5(Quinlan 1993).


Basic idea is to pick a feature, split the training set into subsets
based on the different values of the feature. If subset consists
of instances from the same class (after pruning), stop, else split
on a different feature.


The feature with the greatest information gain is picked as the
next feature to split on. Information gain is measured in terms of
entropy, and in this case the feature that will yield the lowest
possible entropy is selected.


Example:

“(Ms.Washington)’s

candidacy is being

championed by (several

powerful lawmakers)

including ((her) boss).”



Feature set:

DIST SEMCLASS NO. GENDER PROPER_NAME ALIAS J_PRON DEF_NP DEM_NP STR_MATCH APPOSITIVE I_PRON

(
0

1
+


1

-


-


+

-


-


-


-


-
)

Does (Ms. Washington, her) corefer?


The Decision Tree

STR_MATCH

+

J_PRONOUN

+

-

+

-

APPOSITIVE

+

-

+

ALIAS

+

-

+

-

GENDER

0

-

2

-

1

I_PRONOUN

+

-

DIST

<=0

>0

-

NUMBER

+

-

+

-

Note: Only 8
out of 12
features are

used in the
final tree


Results


MUC
-
6: Recall 58.6%, Precision 67.3%, F
-
Measure:
62.6%. Pruning set at 20%, min. no. of instances set at 5


MUC
-
7: Recall 56.1%, Precision 65.5%, F
-
Measure:
60.4%.Pruning set at 60%, min. no. of instances set at 2.


Results about 3
rd

or 4
th

amongst the best MUC
-
6 and
MUC
-
7 systems


Errors inherited from the pipeline NLP modules: POS
tagger (96%), Named Entity Recognizer ( only 88.9%),
and NP identification (about 90%) . Overall, in one test of
100 MUC annotated documents, achieved about 85%
accuracy.

Error Analysis (on 5 random documents from
MUC
-
6)

The types and frequencies of errors that affect
precision
.

Types of Errors Causing Spurious Links Frequency
%

Prenominal modifier string match 16 42.1%

Strings match but noun phrases refer to 11 28.9%


different entities

Errors in noun phrase identification 4 10.5%

Errors in apposition determination 5 13.2%

Errors in alias determination 2 5.3%






The types and frequencies of errors that affect
recall
.

Types of Errors Causing Missing Links Frequency %

Inadequacy of current surface features 38 63.3%

Errors in noun phrase identification 7 11.7%

Errors in semantic class determination 7 11.7%

Errors in part
-
of
-
speech assignment 5 8.3%

Errors in apposition determination 2 3.3%

Errors in tokenization 1 1.7%


Conclusions


Very good results (comparatively) for a relatively simple set of features.


The 3 most important features were STR_MATCH, APPOSITIVE & ALIAS
(discovered by training & testing with just these features). In fact, these 3
features account for 60.3%, 59.4% of the F
-
measure for MUC
-
6, MUC
-
7
respectively. Which means the other 9 features contribute only 2.3%(for
MUC
-
6) and 1% for MUC
-
7.


Some reasons why it performed better than the only comparable system in
MUC(RESOLVE from UMass) are:



Higher recall using the larger no. of semantic classes.



The 3 crucial features (RESOLVE did not have the



APPOSITIVE feature).



Stopping at the first +ve coreference.