MCORES: a system for noun phrase

plantationscarfΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

88 εμφανίσεις

Andreea Bodnari,
1

Peter Szolovits,
1

Ozlem Uzuner
2


1
MIT, CSAIL, Cambridge, MA, USA

2
Department of Information Studies, University at Albany SUNY,
Albany, NY, USA

10.16.2012
-

Rochester, MN

MCORES: a system for noun phrase
coreference resolution for clinical records

2012
SHARPn

Summit “Secondary Use”


Medical coreference resolution system (MCORES)


Experimental results


Conclusion

Page
2

Electronic Medical Records (EMRs)


large
information repositories


Clinical information requires processing



Lower level: sentence parsing, tokenization


Higher level: coreference resolution, semantic
disambiguation


Coreference resolution: a fundamental step in text
processing

Page
3

English medical corpus provided by i2b2 National Center
for Biomedical Computing


De
-
identified medical discharge summaries


Source: PH & BIDMC


Content: 230(PH) + 196(BIDMC) discharge summaries


Annotated concepts and coreference chains


Concept types






Page
4

Persons

Problems

Treatments

Tests

Pronouns

NP Instance Creation

Feature Generation

Classification

Output Clustering

Page
5


Markables of same semantic category are
paired together



MCORES creates positive instances only
from neighboring markable pairs in a
chain



1
Instance creation akin to McCharty and Lehnert

Page
6

Page
7



P
e
r
s
on
s

P
r
ob
l
e
ms

Tr
e
atme
n
ts

Te
s
ts

A
c
r
os
s
al
l
c
ate
gor
i
e
s

Ex
ac
t
Te
x
tu
al

O
ve
r
l
ap

Core
f
e
re
nt

3347

984

786

206

5323

N
on
-
Core
f
e
re
nt

100

29

21

7

157

P
ar
ti
al

Te
x
tu
al

O
ve
r
l
ap

Core
f
e
re
nt

337

1353

764

239

2693

N
on
-
Core
f
e
re
nt

711

1217

557

317

2802

N
o
Te
x
tu
al

O
ve
r
l
ap

Core
f
e
re
nt

5461

597

329

56

6443

N
on
-
Core
f
e
re
nt

5403

46056

19328

6709

77496

Total

Core
f
e
re
nt

9145

2934

1879

501

14459

N
on
-
c
ore
f
e
re
nt

6214

47302

19906

7033

80455

Table 3: Distribution of coreferent and non
-
coreferent
instances per
semantic
category over
instances containing exact, partial,
and no
textual

overlap
.

Multi
-
perspective features


Antecedent perspective


Anaphor perspective


Greedy perspective


Stingy perspective




Phrase
-
level lexical

Sentence
-
level lexical

Syntactic

Semantic

Miscellaneous



Page
8

Phrase
-
level lexical


Token overlap*

Normalized token
overlap

Edit
-
distance

Normalized edit
-
distance



Sentence
-
level lexical


Sentence
-
level token
overlap*

Filtered sentence
-
level
token overlap*

Left and right mention
overlap


stingy
and greedy
perspectives only



Page
9

* multi
-
perspective feature


Syntactic


Number agreement

Noun overlap*

Surname match


Semantic


UMLS
CUI overlap*

UMLS CUI token overlap*

UMLS semantic type
overlap*

Anaphor UMLS semantic
type



Page
10

* multi
-
perspective feature


Token distance

Mention distance

All
-
mention distance

Sentence distance

Section match

Section distance


Page
11


C4.5 decision tree algorithm


Flexible


Readable prediction model


Classify pairs of markables based on values
of the feature vectors


Page
12

Classifier makes pairwise predictions only

Pairwise predictions clustered into coference chains


Aggressive
-
merge
1

clustering algorithm


prediction [M
1
]
-

[M
2
]


all preceding pairwise predictions linked to [M
1
]or
[M
2
]


1
Aggresive
-
merge algorithm proposed by McCarthy and Lehnert

Page
13

Feature set evaluation

Perspectives evaluation

Performance evaluation against


In house baseline


Third party system (RECONCILE
ACL09

& BART)



Evaluation metric: unweighted averages of Recall
,
Precision
, and
F
-
measures
of


MUC


B
3


CEAF


BLANC


Page
14

Page
15


MCORES’ advantage comes from linking markables with no token overlap


Phrase
-
level sub
-
MCORES performs similarly to MCORES


Greedy perspective system is
the most
favorable single
-
perspective system


Multi
-
perspective system performs as well or better than single
-
perspective
systems


Error analysis


MCORES fails
to classify misspelled person pairs



Medical problems
false
positives due
to difference between newly and recurring events



Treatments false positives
due to medications presenting
different routes of administration



Tests false positive due to the large number of
full
overlap instances that did
not
corefer







Page
16

Developed coreference resolution system for the
medical domain (MCORES)


MCORES innovates through a multi
-
perspective
and knowledge
-
based feature set


MCORES outperforms third party systems and
an in
-
house baseline
, improving coreference
resolution on clinical records

Page
17