Learning to Extract Relations from the Web

unknownlippsΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

65 εμφανίσεις

Learning to Extract Relations from the Web

using Minimal Supervision

Razvan C. Bunescu

Machine Learning Group

Department of Computer Sciences

University of Texas at Austin

razvan@cs.utexas.edu

Raymond J. Mooney

Machine Learning Group

Department of Computer Sciences

University of Texas at Austin

mooney@cs.utexas.com

Introduction: Relation Extraction


People are often interested in finding relations between
entities:


What proteins interact with IRAK1?


Which companies were acquired by Google?


In which city was Mozart born?



Relation Extraction
(RE) is the task of automatically

locating predefined types of relations in text documents.

1


Relation Examples:

1)
Protein Interactions:



2)
Company Acquisitions:



3)
People Birthplaces:

Introduction: Relation Extraction


The phosphorylation of
Pellino2

by activated
IRAK1

could trigger
the translocation of IRAKs from complex I to II.


Search engine giant
Google

has bought video
-
sharing website
YouTube

in a controversial $1.6 billion
deal.


Wolfgang Amadeus Mozart

was born to
Leopold

and
Ana Maria
Mozart
, in the front room of Getreidegasse 9 in
Salzburg
.

2

Motivation: Minimal Supervision


Developing an RE system usually requires a significant
amount of human effort:


Extraction patterns
designed

by a human expert [
Blaschke et al.,
2002
].


Extraction patterns
learned

from a corpus of manually annotated
examples [
Zelenko et al., 2003; Culotta and Sorensen, 2004
].


A different RE approach:


Extraction patterns
learned

from
weak supervision

derived from a
significantly reduced amount of human supervision.

3

Relation Extraction with Minimal Supervision


Human supervision


a handful of
pairs of entities

known
to exhibit (+) or not exhibit (

) a particular relation.



Weak supervision


b
ags of sentences

containing the
pairs, automatically extracted from a very large corpus.



Use bags of sentences in a
Multiple Instance Learning
framework
[
Dietterich et al., 1997
]
to train a relation
extraction model.


4

Types of Supervision for RE


Single Instance Learning (SIL):


A corpus of positive and negative
sentence examples
, with the two
entity names annotated.


A sentence example is positive
iff

it explicitly asserts the target
relationship between the two annotated entities.


Multiple Instance Learning (MIL):


A corpus of positive and negative
bags of sentences
.


A bag is positive
iff

it contains at least one positive sentence
example.

5

RE from Web with Minimal Supervision

+/


䅲A畭敮琠a
1

Argument a
2

+

Google

YouTube

+

Adobe Systems

Macromedia

+

Viacom

DreamWorks

+

Novartis

Eon Labs



Yahoo

Microsoft



Pfizer

Teva

Example pairs of named entities for
R



C
orporate Acquisitions
.

6

Minimal Supervision: Positive bags

Use a search engine to extract bags of sentences containing
both entities in a pair.


Google, YouTube


S
1

Search engine giant
Google

has bought video
-
sharing website
YouTube

in a
controversial $1.6 billion deal.

S
2

The companies will merge
Google
's search expertise with
YouTube
's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.

.

.

.

.

.

.

S
n

Google

has acquired social media company
YouTube

for $1.65 billion in a
stock
-
for
-
stock transaction as announced by Google Inc. on October 9, 2006.

7

Minimal Supervision: Positive bags

Use a search engine to extract bags of sentences containing
both entities in a pair.


Google, YouTube


S
1

Search engine giant
Google

has bought video
-
sharing website
YouTube

in a
controversial $1.6 billion deal.

S
2

The companies will merge
Google
's search expertise with
YouTube
's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.

.

.

.

.

.

.

S
n

Google

has acquired social media company
YouTube

for $1.65 billion in a
stock
-
for
-
stock transaction as announced by Google Inc. on October 9, 2006.

8

Minimal Supervision: Positive bags

Use a search engine to extract bags of sentences containing
both entities in a pair.


Google, YouTube


S
1

Search engine giant
Google

has bought video
-
sharing website
YouTube

in a
controversial $1.6 billion deal.

S
2

The companies will merge
Google
's search expertise with
YouTube
's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.

.

.

.

.

.

.

S
n

Google

has acquired social media company
YouTube

for $1.65 billion in a
stock
-
for
-
stock transaction as announced by Google Inc. on October 9, 2006.

9

Minimal Supervision: Negative Bags

Use a search engine to extract bags of sentences containing
both entities in a pair.


Yahoo, Microsoft



S
1

Yahoo

is starting to look more like
Microsoft

and less like the innovative,
unified service that got my loyalty in the first place.

S
2

Whatever it is,
Yahoo

is dashing in front, with
Microsoft

close behind.

.

.

.

.

.

.

S
n

Yahoo

and
Microsoft

teamed up on October 12 to make their instant
messaging software compatible.

10

Minimal Supervision: Negative Bags

Use a search engine to extract bags of sentences containing
both entities in a pair.


Yahoo, Microsoft



S
1

Yahoo

is starting to look more like
Microsoft

and less like the innovative,
unified service that got my loyalty in the first place.

S
2

Whatever it is,
Yahoo

is dashing in front, with
Microsoft

close behind.

.

.

.

.

.

.

S
n

Yahoo

and
Microsoft

teamed up on October 12 to make their instant
messaging software compatible.

11

MIL Background: Domains


Originally introduced to solve a Drug Activity prediction
problem in biochemistry [
Dietterich et al., 1997
]


Each molecule has a limited set of low energy conformations


bags of 3D conformations.


A bag is positive is at least one of the conformations binds to a
predefined target.


MUSK dataset [
Dietterich et al., 1997
]


A bag is positive if the molecule smells “musky”.



Content Based Image Retrieval [
Zhang et al., 2002
]


Text categorization [
Andrews et al., 03
], [
Ray et al., 05
].


12

MIL Background: Algorithms


Axis Parallel Rectangles [
Dietterich, 1997
]


Diverse Density [
Maron, 1998
]


Multiple Instance Logistic Regression [
Ray & Craven, 05
]


Multi
-
Instance SVM kernels of [
Gartner et al., 2002
]


Normalized Set Kernel.


Statistic Kernel.

13

MIL for Relation Extraction


Focus on SVM approaches


Through kernels, can work efficiently with instances that implicitly
belong to a high
-
dimensional feature spaces.


Can reuse existing relation extraction kernels.



Multi
-
Instance kernels of
[
Gartner et al., 2002
]
not appropriate
when
very few bags
:


Bags (not instances) are considered as training examples.


The number of SVs is upper bounded by the number of bags


Very few bags


very few SVs


insufficient capacity.

14

MIL for Relation Extraction


A simple approach to MIL is to transform it into a standard supervised
learning problem:


Apply the bag label to all instances inside the bag.


Train a standard supervised algorithm on the transformed dataset.


Despite class noise, obtains competitive results [
Ray & Craven, 05
]


Google, YouTube


S
1

Search engine giant
Google

has bought video
-
sharing website
YouTube

in a controversial
$1.6 billion deal.

S
2

The companies will merge
Google
's search expertise with
YouTube
's video expertise, pushing
what executives believe is a hot emerging market of video offered over the Internet.

.

.

.

.

.

.

S
n

Google

has acquired social media company
YouTube

for $1.65 billion in a stock
-
for
-
stock
transaction as announced by Google Inc. on October 9, 2006.

15

MIL for Relation Extraction


A simple approach to MIL is to transform it into a standard supervised
learning problem:


Apply the bag label to all instances inside the bag.


Train a standard supervised algorithm on the transformed dataset.


Despite class noise, obtains competitive results [
Ray & Craven, 05
]


Google, YouTube


S
1

Search engine giant
Google

has bought video
-
sharing website
YouTube

in a controversial
$1.6 billion deal.

S
2

The companies will merge
Google
's search expertise with
YouTube
's video expertise, pushing
what executives believe is a hot emerging market of video offered over the Internet.

.

.

.

.

.

.

S
n

Google

has acquired social media company
YouTube

for $1.65 billion in a stock
-
for
-
stock
transaction as announced by Google Inc. on October 9, 2006.

16

SVM Framework with MIL Supervision

minimize:

subject to:

17

SVM Framework with MIL Supervision

minimize:

subject to:

Regularization term

18

SVM Framework with MIL Supervision

minimize:

subject to:

Error on positive bags

19

SVM Framework with MIL Supervision

minimize:

subject to:

Error on negative bags

20

SVM Framework with MIL Supervision

minimize:

subject to:



c
p
, c
n

> 0, c
p
+ c
n

= 1, controls the relative influence that
false negative

vs.
false positives

have on the objective
function.



want c
p

< 0.5 (penalize
false negatives

less than
false
positives
); used c
p

= 0.1

21

SVM Framework with MIL Supervision

minimize:

subject to:



Dual formulation


kernel between bag instances K(x
1
,x
2
)



(x
1
)

(x
2
).



Use SSK


a subsequence kernel customized for relation extraction.

[
Bunescu & Mooney, 2005
]

22

The Subsequence Kernel for Relation
Extraction


Implicit features are sequences of words anchored at the
two entity names.


e
1



bought




e
2



billion



deal
.



s


a word sequence

Google

has
bought

video
-
sharing website
YouTube

in a controversial $1.6
billion deal
.

g
1


1

g
2



3

g
3



4

g
4



0



x


an example sentence, containing
s

as a subsequence

[
Bunescu & Mooney, 2005
].




s
(
x)


the value of feature

s

in example
x


23

The Subsequence Kernel for Relation Extraction


K(x
1
,x
2
)



(x
1
)

(x
2
)


the number of common “anchored”
subsequences between
x
1

and
x
2
, weighted by their total gap.



Many relations require at least one content word


m
odify
kernel to optionally ignore sequences formed exclusively of
stop words and punctuation signs.



Kernel is computed efficiently by a generalized version of
the dynamic programming procedure from
[
Lodhi et al., 2002
].


[
Bunescu & Mooney, 2005
].

24

Two Types of Bias


The MIL approach to RE differs from other MIL problems
in two respects:


The training dataset contains
very few bags
.


The bags can be
very large
.



These properties lead to two types of bias:


[
Type I
] Combinations of words that are correlated to the two
relation arguments are given too much weight in the learned
model.


[
Type II
] Words specific to a particular relation instance are given
too much weight.

25

Type I Bias


Google, YouTube


S
1

Search engine giant
Google

has bought video
-
sharing website
YouTube

in a controversial $1.6 billion deal.

S
2

The companies will merge
Google
's search expertise with
YouTube
's
video expertise, pushing what executives believe is a hot emerging
market of video offered over the Internet.



Overweighted Patterns:



search



e
1



video



e
2






e
1



video



e
2





e
1



search



e
2





e
1



search



e
2



video

26

Type II Bias


Google, YouTube



S
1

Ever since
Google

paid $1.65 billion for
YouTube

in October , plenty of
pundits


from Mark Cuban to yours truly


have been waiting for the other
shoe to drop.

S
2

Google

Gobbles Up
YouTube

for $1.6 BILLION


October 9, 2006

S
3

Google

has acquired social media company
YouTube

for $1.65 billion in a
stock
-
for
-
stock transaction as announced by Google Inc. on October 9, 2006.



Overweighted Patterns:





e
1



for



e
2



October





e
1



has



e
2



October

27

A Solution for Type I Bias


Use the SSK approach, with new feature weight:


Modify subsequence kernel computations to use word
weights

(w)
.



Want small

(w)

for words
w

correlated with either of the
two relation arguments.

28

A Solution for Type I Bias: Word Weights

29

Use a formula for word weights

(
w) that d
iscounts the effect
of correlations of
w

with either of the two arguments
a
1

and
a
2
.

A Solution for Type I Bias: Word Weights

The # of sentences in bag
X
.

30

A Solution for Type I Bias: Word Weights

The # of sentences in bag
X

that
contain word
w
.

31

A Solution for Type I Bias: Word Weights

The probability that the word w appears in a sentence due
only to the presence of X.a
1

or X.a
2
, assuming X.a
1

and
X.a
2

are independent causes for w.



P(w|a)

is the probability that
w

appears in a sentence due to the presence of
a
.



Estimate
P(w|a)

using counts from a
separate bag
of sentences containing
a
.


32

MIL Relation Extraction Datasets


Given two arguments a
1

and a
2
, submit query string
“a
1

* * * * * * * a
2
” to Google.


Download the resulting documents (less than 1000).


Split text into sentences and tokenize using the OpenNLP
package.


Keep only sentences containing both a
1

and a
2
.


Replace closest occurrences of a
1

and a
2

with generic tags

e
1


and

e
2


.

33

MIL Relation Extraction Datasets

+/


䅲杵浥湴⁡
1

Argument a
2

Bag size

+

Google

YouTube

1375

+

Adobe Systems

Macromedia

622

+

Viacom

DreamWorks

323

+

Novartis

Eon Labs

311



Yahoo

Microsoft

163



Pfizer

Teva

247

+

Pfizer

Rinat Neuroscience

50 (41)

+

Yahoo

Inktomi

433 (115)



Google

Apple

281



Viacom

NBC

231

Training Pairs

Testing
Pairs

manually labeled

all bag sentences

Corporate Acquisitions Dataset

34

MIL Relation Extraction Datasets

+/


䅲杵浥湴⁡
1

Argument a
2

Bag size

+

Franz Kafka

Prague

522

+

Andre Agassi

Las Vegas

386

+

Charlie Chaplin

London

292

+

George Gershwin

New York

260



Luc Besson

New York

74



W. A. Mozart

Vienna

288

+

Luc Besson

Paris

126 (6)

+

Marie Antoinette

Vienna

39 (10)



Charlie Chaplin

Hollywood

266



George Gershwin

London

104

Training Pairs

Person

Birthplace

Dataset

35

Testing
Pairs

manually labeled

all bag sentences

Experimental Results: Systems


[
SSK
-
MIL
] MIL formulation using the original SSK.


[
SSK
-
T1
] MIL formulation with the SSK modified to use
word weights in order to reduce Type I bias.


[
BW
-
MIL
] MIL formulation using a bag
-
of
-
words kernel.


[
SSK
-
SIL
] SIL formulation using the original subsequence
kernel:


Use manually labeled instances from the test bags.


Train on instances from one positive bag and one negative bag, test
on instances from the other two bags.


Average results over all four combinations.

36

Experimental Results: Evaluation

1)
Plot
Precision

vs.

Recall

(PR)

graphs:



vary a threshold on the extraction confidence
.


2)
Report
Area Under PR Curve (AUC).


37

Company Acquisitions

38

Person

Birthplace

39

Experimental Results: AUC






SSK
-
T1 is significantly more accurate than SSK
-
MIL.


SSK
-
T1 is competitive with SSK
-
SIL, however:


SSK
-
T1 supervision


only
6 pairs (4 positive).


SSK
-
SIL average supervision:


~500 manually labeled sentences (78 positive) for Acquisitions.


~300 manually labeled sentences (22 positive) for Birthplaces.

Dataset

SSK
-
MIL

SSK
-
T1

BW
-
MIL

SSK
-
SIL

Company Acquisitions

76.9%

81.1%

45.8%

80.4%

People Birthplace

72.5%

78.2%

69.2%

73.4%

40

Applications & Extensions


A “Google Sets” system for relation extraction


Ideally, the user provides only positive pairs.


Likely negative examples are created by pairing the argument
entity with other named entities in the same sentence.


Any pair of entities different from the relation pair is likely to be
negative


implicit negative evidence.

Google

YouTube

Adobe Systems

Macromedia

Viacom

DreamWorks

Novartis

Eon Labs

Pfizer

Rinat Neuroscience

Yahoo

Inktomi

.

.

.

.

.

.

Input

Output

41

Future Work


Investigate methods for reducing Type II bias.



Experiment with other, more sophisticated MIL algorithms.



Explore the effect of Type I and Type II bias when using
dependency information in the relation extraction kernel.

42

Conclusion


Presented a new approach to Relation Extraction, trained
using only a handful of pairs of entities known to exhibit or
not exhibit the target relationship.



Extended an existing subsequence kernel to resolve
problems caused by the minimal supervision provided.



The new MIL approach is competitive with its SIL
counterpart that uses significantly more human supervision.

43