Unabridged Manuscript - Journal of the American Medical ...

spraytownspeakerΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 4 χρόνια και 2 μήνες)

175 εμφανίσεις

Li,
a Medica
tion Event Extraction System


1


Lancet: a
High Precision
Medication
Event

Extraction System
for

Clinical Text


Authors:

Zuofeng Li
, MD, PhD,
1

Feifan Liu, PhD,
1

Lamont Antieau, PhD,
1

Yonggang Cao
, PhD,
1


Hong Yu, PhD,
1,2
,*


1 College of Health Sciences, University of Wisconsin


Milwauke
e, Wisconsin, USA

2 College of Engineering, University of Wisconsin


Milwaukee, Wisconsin, USA


* To whom correspondence should be addressed:

Hong Yu Ph. D.

2400 E Hartford Ave

Milwaukee WI 53211, USA

Email:
hongyu@uw
m.edu

Phone:
(414) 229
-
3344

Fax:

(414) 229
-
5100



Li,
a Medica
tion Event Extraction System


2

Originality Declaration and License Statements

I, as corresponding author, promise that I and all persons listed as coauthors on this
submitted work have read and un
derstand the "Originality of Manuscripts" statement regarding
submissions to JAMIA (available at
http://jamia.bmj.com/site/about/originalityofmanuscripts.xhtml
) and
confirm that this
submission is a new, original work that has not been previously submitted, published in whole
or in part, or simultaneously submitted for publication in another journal. Also, in accordance
with the aforementioned policy, we have included

as part of the submission any previously
published materials that overlap in content with this new original manuscript.


The Corresponding Author has the right to grant on behalf of all authors and does grant on
beh
alf of all authors, an exclusive license (or non
-
exclusive for government employees) on a
worldwide basis to BOTH The American Medical Informatics Association and its publisher for
JAMIA, the BMJ Publishing Group Ltd and its Licensees to permit this articl
e (if accepted) to
be published in Journal of the American Medical Informatics Association and any other
BMJPGL products to exploit all subsidiary rights, as set out in our license
(
http://group.bmj.com/products/journals/instructions
-
for
-
authors/licence
-
forms
).

Li,
a Medica
tion Event Extraction System


3

Abstract

Objective:

We

present
Lancet, a
supervised machine
-
learning

system
that
automatically
extracts
medication events

consisting of

medicat
ion

names and information pertaining to their prescribed
use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical
discharge summaries.

Design:

The Lancet system incorporates three supervised machine
-
learning models: a
cond
itional random fields (CRF) model for tagging individual medication names and associated
fields, an AdaBoost model with decision stump algorithm for determining
which

medication
name
s

and fields belong to a single medication event, and a
support vector mac
hine
s

(
SVM
)

disambiguation model for identifying the context style (narrative or list).

Measurements:

We participated
in
the third i2b2 shared
-
task
for

challenges in natural language
processing for clinical data
:

medication extraction challenge
.
With the p
erformance metrics
provided by
the
i2b2 Challenge, we report the micro F1 (precision/recall) scores on both
the
horizontal and vertical level.

Results:

Among the top ten teams,
the Lancet system achieved the highest precision

at

90.4%

with an overall F1 sc
ore of 76.4% (
horizontal system level

with exact match), a gain
of 11.2%
and 12%
,

respectively
,

compared to the rule
-
based baseline system jMerki
.

By combining the
two systems, the hybrid system further increased
the
F1 score
by 3.4% from 76.4% to 79
.0
%.

C
onclusions:

We conclude that supervised machine
-
learning systems with minimal external
knowledge resources can achieve a high precision with a competitive overall F1 score.
Our
Lancet system based on this learning framework does not rely on expensive manua
lly
-
curated
rules.

The system is available online at http://code.google.com/p/lancet/.

Li,
a Medica
tion Event Extraction System


4

I.
INTRODUCTION

Medication is an important part of a patient’s medical treatment, and nearly all patient records
incorporate a significant amount of medication informat
ion. The administration of medication at
a specific time
-
point during the patient’s medical diagnosis, treatment, or prevention of disease is
referred to as a medication event,
[1
-
3]

and the written representation of these events typically
comprises the nam
e of the medication and any of its associated fields, including

but not limited
to

dosage, mode, frequency, etc.
[4]

Accurately capturing medication events from patient records
is an important step towards large scale data mining and knowledge discovery,
[5]

medication
surveillance and clinical decision support,
[6]

and medication reconciliation.
[7
-
10]

In addition to its importance, medication event information (e.g., treatment outcomes, medication
reactions and allergy information) is often difficult to extra
ct, as
clinical records exhibit a range
of different styles and grammatical structures

for recording such information
.
[4]

Thus,
Informatics for Integrating Biology & the Bedside (i2b2) recognized automatic medication event
extraction with natural language
processing (NLP) approaches as one of the great challenges in
medical informatics. As one of 20 groups that participated in the i2b2 medication extraction
challenge, we report in this study on the Lancet system, which we developed for medication
event extr
action.

II.
RELATED WORK

Over two decades, several approaches and systems have been developed to extract information
from clinical narratives. Earlier work mapped terms appearing in clinical narratives to concepts
in external clinical terminologies (e.g.,
SNOMED).
[11]

Later systems explored syntactic and
semantic parsing and pattern matching (e.g., MedLEE and others) for deeper information
extraction.
[12,13]

Recently, supervised machine
-
learning approaches have
been
explored,
including those for finding tem
poral order in discharge summaries and others for identifying the
smoking status of patients from medical discharge records.
[14,15]



Systems for medication event extraction have been reported previously. Gold et al.

[1]

developed
a rule
-
based system calle
d MERKI to extract medication names and the corresponding attributes
from structured and narrative clinical texts. Cimino et al.

[16]

explored the MedLEE system to
Li,
a Medica
tion Event Extraction System


5

extract medication information from clinical narratives; medication names and three states o
f
medication events, namely, initiation, change and discontinuation, were extracted
for the purpose
of medication reconciliation
in their study. Recently, Xu et al
.
[4]

built an automat
ic

medication
extraction system (MedEx) on discharge summaries by lever
aging semantic rules and parsing
techniques, achieving promising results for
extracting
medication and related
fields
.


There are also some commercial systems designed to extract medication information from
medical records, including LifeCode, A
-
Life Medi
cal, FreePharma, etc. Jagannathan et al.
[17]

evaluated the performance of four commercial NLP tools to extract medication information from
discharge summaries and family practice notes. Their analysis
reported

that these tools
performed well
in

recognizin
g medication

names

but
poorly

on

recognizing related information
such as dosage
,
route

and

frequency.


Although the existence of such NLP systems is evidence of the progress that has been made in
this area, most of these systems are not publicly available.

Furthermore, different systems have
been developed for different purposes and have been evaluated against different gold standards
.
This

makes comparing these approaches to one another a challenging task. Therefore, the i2b2
project attempts to provide a
common
purpose

and gold standard
to different NLP systems.
[15]


III.
THE I2B2 DATA AND EX
TRACTION TASK


A. Medication Event Extraction


The i2b2 challenge defines a medication event as an event incorporating a medication name and
any of the following assoc
iated fields: dosage, frequency,
mode,
duration
and reason.

Table 1
shows the definition released by the i2b2 organizers and shows that the i2b2 medication event
definition largely follows from previous work, particularly
[13]

and

[1]
. As an example, Figur
e 1
shows a clinical narrative/list excerpt released by the i2b2 organizers in which medication event
s
were annotated

based on the i2b2 annotation guideline
s
.


Li,
a Medica
tion Event Extraction System


6

[Table 1 about here]


While the challenge was to extract all medication events f
ro
m both lists a
nd narrative context,
the challenge's main interest was in the extraction of medication information from the narrative
medical records, as illustrated in
Figure

1.


[
Figure

1 about here]


B.
Training dataset and annotation


A dataset of 696 un
-
annotated de
-
identified patient discharge summaries
from Partner Healthcare
(1990
-
2007)
were released by the i2b2 organizers about ten weeks before the competition.
[18]

The dataset is available at i2b2 web site.
[19]

At the same time, the organizers also released the
f
irst version of the annotation guideline and 17 discharge summaries (a subset of the 696
discharge summaries) that were annotated by the organizers. Over the next ten weeks, all groups
participating in the challenge took part in a discussion over the guide
line, which was iteratively
refined and the annotation of the 17 discharge summaries updated according to the discussion
and guideline refinements. Towards the end, the final annotation of the 17 discharge summaries
was considered “ground truth” by the i2b
2 organizers.


Throughout this process, two of the authors (ZFL and LA) manually and independently
annotated 75 and 72 discharge summaries that were randomly selected from the 696 patient
discharge summaries. Each article is only annotated by one annotator
.

This collection of 147
summaries incorporated the 17 “ground truth” summaries. The 17 summaries annotated by
ourselves were then measured
against the “ground truth” summaries to determine annotation
agreement, the results of which will be discussed in th
e error analysis section
. In addition, after
competition, 10 summaries were re
-
annotated to explore the agreement between the two
annotators.


Li,
a Medica
tion Event Extraction System


7

Our

147 manually annotated summaries incorporated a total of 5,184 medication entries (2,175
narratives and 3,009

lists); 2,742 instances of dosage; 2,042 instances of mode; 2,583 instances
of frequency; 223 instances of duration; and 709 instances of reason.


IV.
MEDICATION EVENT EXT
RACTION SYSTEMS

In this section, we describe Lancet, a supervised machine
-
learning s
ystem for medication event
extraction
. For the performance comparison, we also implemented

a rule
-
based system as a
baseline
and a hybrid system
.


A.
The Lancet system


The overall Lancet system is shown in Figure 2. Lancet incorporate
d

three supervised ma
chine
-
learning (ML) models:
1) CRF model, a
conditional random fields (CRF)

model

for identifying
instances of a medication name (m) and its associated fields: dosage (do),
mode

(mo), frequency
(f), duration (du) and reason (r);
2)Medication relationship m
odel, an AdaBoost classification
model with decision stump
for associating a medication name with its corresponding fields;
3)
list/narrative SVM model, a support vector machines (SVM)

classifier for
distinguishing

lists
from narrative
s
.


In the following
, we will first describe data pre
-
processing, and we will then describe each of the
three ML models and how they are integrated for the final Lancet system.


[Figure
2

about here]


1.
Pre
-
processing


Our pre
-
processer first
converts the text in
each discha
rge summary

into lower case
. It then
applies manually curated
pattern
-
matching rules

to recognize discharge summary sub
-
sections,
including history,

medication, physical examination, follow up, diagnosis, allergy, family
Li,
a Medica
tion Event Extraction System


8

history, etc. For instance, the fol
lowing regular expression
s

w
ere

used to detect medication
-
related
sub
sections:

'medications
\
s+on
\
s+(admission|discharge|transfer)',
'(discharge|transfer|home|admi
\
w+|new)
\
s+(medication|med)s?', '(prn
\
s+)?med(ication)?s'

We applied
S
plitta

for sentence boun
dary detection.
[20]

The sentence
boundary information

was
used
for the
list/narrative
classification
and
association

between
the
medication name and

its

medication fields.

2.
A CRF model for medication named entity recognition


Using
the 147 annotated disc
harge summaries, we trained a conditional random field (CRF)
model to recognize
the
medication name and five fields (do, mo, f, du and r). The model was
trained
using

ABNER, an open
-
source biomedical named entity recognizer
[21]
. We applied the
default fea
ture set, which are
the
standard
bag
-
of
-
words
, morphology, and n
-
gram features.

3.
An
AdaBoost model

for associating a medication name with its corresponding fields


We built a supervised machine
-
learning classifier to associate a medication with its field
s. This
two
-
way classifier attempted to
determine whether

a medical field
was associated with a

medication name

or not
.
As
the number of potential medication
-
name
-
field pairs can be large
, w
e
followed a heuristic rule suggested by the i2b2 organizers
in wh
ich
any
medication name and
field within the distance of two lines (+/
-

two lines)
was

consider
ed

to be

a candidate medication
-
name
-
field pair. The features used to train the model are displayed in
T
able
2
.

For implementation, we used the AdaBoost.M1 with
Decision stump

in
the
Weka toolkit
, which
is a well
-
known algorithm less susceptible to over
-
fitting.

[22]

[Table 2 about here]

4.
A support vector machine
s

(
SVM
)

classifier for
distinguishing

lists from narrative text


Li,
a Medica
tion Event Extraction System


9

One of the i2b2 competition requirem
ents was to determine whether the text describing a
medication
is in

a
list or
a
narrative

format
. Using the 147 annotated discharge summaries as the
training data, we built a
SVM
classifier

(
Weka

Toolkit
[22]
)

to
determine

the form
at

of each
candidate sen
tence.

W
e used
bag of words
, bi
-
grams
,

tri
-
grams

and subsection features
.
The hypothesis for the
subsection feature is that medication events in the medication subsection (recognized in Section
IV.A.1) are more likely in the list format.

5.
The integrati
on

We integrate
d

all three models into the Lancet system. Lancet first detects medication names
and
fields
with the CRF model
, and then
applies
the AdaBoost model

to

determine
whether a
medication field belongs to a medication name. Finally, a SVM classifi
er separates lists from
narratives.

B.
jMerki−
A
rule
-
based
baseline system


jMerki was a rule
-
based system implemented

in JAVA. It integrated the rules in the MERKI
system,
[1]

including rules for dosage, frequency, time and PRN. We added additional rules

for
the i2b2 medication detection, including applying regular expressions to detect subheadings in
discharge summar
ies
.

The system performed dictionary look up and regular expression
matching
for identifying related fields. We built a medication name dic
tionary with two external
knowledge resources,
RxNorm and
Drug
Bank
.
[1,23]

Th
is

baseline system
cannot recognize list
or narrative form, so the
Lancet SVM classifier
was employed for the performance evaluation
.


C.
The hybrid system


As a post hoc experimen
t, we built a hybrid system
to increase both recall and precision
.
Specifically, we aligne
d and matched jMerki and Lancet

system
s’

outputs. If both jMerki and
Lancet detected the same medication name, but differed in other content (e.g., dosage, etc),
the
Li,
a Medica
tion Event Extraction System


10

Lancet’s output was chosen because it has a higher precision than jMerki
. If jMerki and Lancet
did not agree with a medication name, then the hybrid system kept both medication entries

detected by the two systems
.
This step would increase recall.

V.
EVALU
ATION

A. Metrics

The i2b2 organizers used two sets of evaluation metrics: strict evaluation (exact match) and
relaxed evaluation (inexact match)
, which are adapted from the evaluations of the question
answering track in TREC
.
[24]

For each medication entry,

exact match
calculates the precision
and recall of the instance
, whereas inexact match
calculates

the proportion of system
-
returned
token
s that overlap with the ground truth. Given the aligned system output, two

type
s of
evaluation measures were performed
: Horizontal level

(
focus on medication events
) and Vertical
level (
focus on medication names and fields
)
.

The organizers
also
performed evaluation at two different levels of granularity: (a) patient record
level, which was the micro
-
average over all the
entries in a single record and then the macro
-
average over all the records in the system output
;

and (b) system level, which is the micro
-
average over all entries in the system output.

The primary evaluation metric of this completion is system level horizo
ntal evaluation. To
calculate the precision and recall in horizontal level for
system entry against ground truth entry
,
the
following formula
s

were

used. For details, please refer

[25]
.



Similar to
the
TREC ev
aluation
, t
he
F1
-
score

was reported
, which is the harmonic mean of
instance precision (IP) and instance recall (IR), F1= 2(IP*IR)/(IP+IR). IP is the total number of
Li,
a Medica
tion Event Extraction System


11

correctly identified instances out of the total number of identified instances. IR is the t
otal
number of correctly identified instances
out of the total number from the ground truth list.


B.
Gold Standard


T
he gold standard used
for the i2b2 evaluation
was built as a community effort.
[25]

The whole
dataset

incorporated 8, 942 instances of medi
cation entr
ies

(3,936 narrative and 5,006 list), 4460
instances of dosage, 3,387 instances of mode, 4,039 instances of frequency, 553 instances of
duration and 1,637 instances of reason.

W
e found that
the gold standard

medication names
belonged to 295 cate
gories, which represented 50.4% of total drug categories in DrugBank. The
results suggested that the coverage of drugs in the i2b2 challenging task was reasonably broad.


VI.
RESULTS

A.
Evaluation

of Lancet in the i2b2 challenge 2009


Although we
report

th
e results of
three systems in this
study
, Lancet
was

the
only system of the
three that competed in the i2b2 challenge. Among the top ten systems, Lancet achieved the
highest precision
at

the
system
-
level horizontal evaluation
:

90.4% in exact matching and 9
4.0%
in inex
act matching (Fig
ure

3A and 3B). The corresponding F1 values were 76
.
4
%

and 76
.
5
%
.

For

F1 value, Lancet ranked 10th
in

exact matching and 9th
in

inexact matching.

For list, the
Lancet system achieved the highest precision of 93.1%, with an F1 o
f 66.0%, on exact match at
the system
-
level horizontal evaluation (Fig
ure

3C). For narrative
s
, Lancet achieved a precision of
36.6% with an F1 of 38.4%
. Lancet ranked 10
th

for narratives or lists.
[25]


[Figure 3 about here]


B.
C
omparison

of the three sys
tems


We described earlier
the
three systems we developed: the Lancet system, the rule
-
based jMerki,
and
the

hybrid system.
Table 3
show
s

the results of
all three systems
.

On
horizontal level
Li,
a Medica
tion Event Extraction System


12

evaluation with exact matching
,

Lancet
outperformed jMerki

by
12
.0
% (system) and
10.4
%
(patient).

T
he hybrid system
further improved the performance
by
3.4% (system) and 4.6%
(patient)
, yielding the highest
F1 score of 79
%

(system) and 77
.
6
%

(patient).

For recall, both
Lancet (66.1%) and jMerki (58.7%) are
relatively

l
ow.
On the other hand
, the recall of the hybrid
system increase
s

to 74%.

Similar
ly, on the vertical level evaluation,

Lancet outperformed
jMerki
, and
the
h
ybrid system
outperformed
both
.

The hybrid system achieved good performance
(F1 8
0%
─8
5%
) in

the
field
s
of
dosage, medication,
mode
, and frequency, while
achieved poor performance (F1
2
.
4
%
─21
.
2
%
) in
duration and reason fields.
In addition,
the results show that
the
system

level
performance
was

consistently
better

than the
patient

level for both horizontal
and vertical level
evaluation.

[Table 3 about here]

C.
Error analysis

We
first
examined

annotation inconsistency

and then manually
analyz
ed

the system output
.

We
found that errors were contributed to

data sparseness, multiple medication entries, grammatica
l
errors
in clinical notes, and negated events.

1.
The
challenges in
a
nnotation


As d
escribed earlier
, we
annotated

the 17 “ground truth” summaries
and measured

the annotation
agreement

between our annotation and the annotation by the i2b2 organizer
.
With
the

exact
match evaluation

metrics
,
the agreement between our annotation and
“ground truth”

was a
n

8
1
.
5
%

F1 score

(system level, horizontal)
.
On the vertical level, o
ur
annotation
s sh
ow
ed

a high

agreement in medication

(88
%

F1 score)
,
dosage

(85
%

F1 score)
, frequency

(86
%

F1 score)

and
mode

(89
%

F1 score)

but low

agreement for duration

(36
%

F1 score)

and reason

(33
%

F1 score)
.

We
manually
examined inconsist
ent

annotation
s

and

found
instances of
ambigu
ity that gave rise
to annotation inconsistency
.
These inc
luded:


Li,
a Medica
tion Event Extraction System


13


(1)
Boundary ambiguity.
Example:


Ofloxacin 2000 mg p.o. b.i.d. (both antibiotics to
continue for an additional two week course )
.”

In this example, we
annotated “two week course” as duration instead of “for an additional two
week cou
rse” in the gold standard, both of which are semantically correct.


(2)
Semantic ambiguity.
Example:


NITROGLYCERIN 1/150 (0.4 MG
) 1 TAB SL Q5MIN
X 3 doses

PRN Chest Pain HOLD IF: SBP less than 100
.


In this example,

X 3 doses

performs two fun
ctions
.
In the
i
2
b
2

ground truth annotations, it was
annotated as “dosage”

because it states the number of doses; however,

we annotated it as
“durati
on


because
, at the same time that it states the number of prescribed doses, it also tells

the
duration ove
r which this
particular medication

should be taken
.

Similarly,
in “
She was found to
have

two 95% stenosis in a long segment of the left SFA and the left

distal SFA and anterior
tibial vein graft was completely

thrombosed. She was successfully treated with
stent placement

and

received
heparin and urokinase

in the Intensive Care Unit
overnight

with a turn
-
over
pulses of the left leg Doppler
.”

“anterior tibial vein graft”
wa
s annotated as
the
reason for
“heparin and urokinase” in the gold standard, while we co
nsidered “stent placement” as the
reason.


In addition, we found the ground truth
to be im
perfect.
We
correctly
annotated
“overnight” in the
above example
as the duration of “heparin and urokinase
,
” while it was missed in the ground
truth.
In

another examp
le below, we can see that
the rule of
“+/
-

2 lines”

led to improper
annotation in the ground truth
:



CC:
Hypotension after dialysis

HPI: 56 yo male with h/o ESRD , CAD , CHF ( EF 20
-
25% ) admitted for

hypotension after HD. He was in his USOH until 2 days

PTA when he

developed stomach upset , diarrhea , dry heaves , and a dry cough. He

denied recent travels , and had remote Abx use. At Stodun Hospital ,

he had 5.5 liters removed and afterwards his BP was 66/30. 1 liter of

NS

was given and
his BP

rose to 73
/40.



Li,
a Medica
tion Event Extraction System


14

Here obviously the reason for “NS”

(normal
saline
) is “Hypotension after dialysis
,
” but due to
the “+/
-

2 lines” limitation, “his BP” was annotated as the reason
,

which, strictly speaking, is not
correct and
is
very confusing.


All data was annotate
d by two of the authors
:
ZFL

is a domain expert and LA is a linguist.
During the i2b2 competition, each discharge summary was annotated by one person only. A post
hoc annotation
of 10 discharge summaries (by
ZFL
) showed
0.85─0.95
inter
-
annotator
agreement
at medication name, dosage, mode and frequency. The agreement on duration and
reason was lower, with duration 0.71─0.89 and reason 0.24─0.42.
When limiting to narrative
entries only, the agreement on all fields was 0.12─
0.67.


2.
Data sparseness

One advantage of supervised machine
-
learning systems is that the systems can predict correct
label even if the testing data do not appear in the training data. Such robustness is due to
systems’ ability to capture contextual infor
mation. As described earlier, we annotated a total of
147 records to be used as the training data. This collection of annotated data is in no way
complete. Nevertheless, Lancet detected the medication “persantine” from the text

PERSANTINE

( DIPYRIDAMOLE )

50 MG PO BID
” even though “persantine” did not appear
in the training data. The reason is that Lancet learned the contextual patterns "<m> <do> <mo>
<f>" from the training data. On the other hand, Lancet failed to detect "
Persantine

and viability
cardiac
PET scan 5/19/04
" because no such contextual pattern appeared in the training data. As
a result, data sparseness hurts the recall of Lancet even though it is a supervised machine
-
learning system.


On the other hand, we found that the jMerki lexicon missed

17% medication names. The missing
medication names included general drug names, drug name abbreviations (vanco for vancomycin;
kcl for potassium chloride), drug category names (beta
-
blocker , beta blocker, home medications
or hypoglycemics) and drug name
combination (calcium+vim d). The results suggested that at
best jMerki could perform with 83% recall. On the other hand, the supervised ML system Lancet
could recover some of the medications that would otherwise
be
miss
ed

out by the jMerki system.


Li,
a Medica
tion Event Extraction System


15

3.
Mul
tiple medication entries

As described earlier, the

Lancet system assigned a unique instance from each field to its
corresponding medication name. As a result, Lancet always missed out multiple medication
entries. An example is shown below:

"
NPH HUMULIN INS
ULIN ( INSULIN NPH HUMAN )

2 UNITS QAM; 3 UNITS QPM SC 2 UNITS QAM 3 UNITS QPM
"

In the above example, Lancet correctly
detected one entry: “
2 UNITS QAM
” and associated it
with the medication "NPH HUMULIN INSULIN." On the other hand, the system missed
out

three entries: “
3 UNITS QPM SC
”, “
2 UNITS QAM
” and “
3 UNITS QPM
.
” Therefore Lancet
suffered recall. To estimate how much recall Lancet could

lose, we examined our gold standard
data and found out a total of 449 such multiple medication entries out of the
total 8,942
medication entries, a ~5% decrease in recall.

4.
Medication name m
isspelling

Clinical texts are typically noisy, with significant grammatical errors. We found that such errors
hurt Lancet's performance. For example, the Lancet system failed to
detect "Flagy" in the

"
cholangitis )
Ampicillin

and
Flagy

started 0/16 for ?early cholangitis. 3. CV: h/o htn ,
hyperlipidemia , CE set A B neg ,
" because "Flagy" was incorrectly spelled. When we manually
corrected the spelling as "Flagyl," we found that
Lancet was able to correctly extract the
medication event.

M
isspelled medication names have
also
led Lancet to fail
to

detect other correctly spelled
medication
s
. For example, Lancet failed to detect both "levofloxacin" and "flagy" in "
o
n daily
levofloxaci
n

and
flagy

, will complete a 14 day course
."

After
we corrected the misspell
ing

of
"flagy"
,

the
L
ancet system was able to detect both medication events.

5.
Negat
ion

Negation occur
s

in clinical notes
.

We found that

negative medication
events generally fa
ll into
one of
two categories:
the
medication allergi
es of patients

and medication
s mentioned in the text
but

not
actually
taken by the patient. Two examples are

shown below:

Li,
a Medica
tion Event Extraction System


16

"
DEFINITE ALLERGY ( OR SENSITIVITY ) to
ACE INHIBITORS
,"
and


"
The patient was pl
aced on
heparin

instead of
Coumadin

for Chronicle device lead thrombus
with a PPT goal of 60
-
80.
"

Currently, the Lancet system d
oes

not incorporate negation and scope detection, and as a result,
it incorrectly extracted "ACE inhibitors" and "Coumadin," res
pectively.


D.
Follow
-
up experiments


Based on the results of our error analyses, we further performed post
-
hoc experiments by
exploring negation detection, external medication name dictionaries, and others, to improve
the
medication event extraction based

on our Lancet system.
T
he results are shown in
T
able
4
.
Different features and models
were

explored: “NegPlus” added negative medication features that
we manually annotated
for

the

training
; “Digital normalization” replaced all the digits in the text
with

placeholders; “Affix” used the nomenclature rules recommended by the world health
organization(WHO); “Dictionaries” combined
five

dictionaries
in the model training
, namely

the
WHO nomenclature rules, the CORE Problem List Subset of SNOMED CT
®

released by

the
National Library of Medicine, RxNorm, DrugBank and a modified common English word
dictionary Linux Word

.
[26]

Finally,

“Single
-
line” expanded the sequence scope from one line to
the whole article.


[Table 4 about here]


We noticed that adding negative

medication information and affix information
i
ncrease
d

the
precision of the system
;

in particular,

affix features yielded
a

precision of 91
.
2
%

compared to the
90
.
4
%

of Lancet. But neither of them achieved any overall gain due to degrading recall.
Applying

digital normalization increase
d

the recall
slightly

to 66
.
5
%
, but degrading precision
limited
the
overall gain of the F1 score
. We found that combi
ni
ng
more
dictionary
resources

led
to a marginal improvement
in the

performance of the

Lancet system

in

both

recall and precision,
increasing
the F
1 score

by 2.75%, from 76
.
4
%

to 78
.
5
%
.
Similarly
, changing the multi
-
line
sequence of one article into one single line
also increased both recall and precision,
yield
ing

a
F
1

Li,
a Medica
tion Event Extraction System


17

score of 77
.
8
%

compared to the 76
.
4
%

of th
e Lancet system, with the best precision of 92
.
5
%

compared to 90
.
4
%
.



VII.
DISCUSSION

The agreement between our annotation and the annotation by the i2b2 organizer
s

has shown a
n

F1 score of
81.5%
, which is
much

lower than the annotation agreement reported

by the i2b2
organizer
s

(a
n

F1 score of
89.7%
from

comparing two organizer’s annotation
s

against the ground
truth
). We had annotated the 147 patient records throughout the 10 weeks during which the
annotation guideline
was

iteratively updated. We therefore

speculate that the inconsistency was
at least partially introduced through the guideline refinement process
. S
ince

the Lancet system
was trained on these 147 records, the annotation inconsistency contributed to errors

in the
system’s performance
.
In addit
ion, the study on the commun
ity annotation agreement by
the
i2b2

organizer
s

shows the m
ean system level F
-
measure
to be
82
.
4
%
(exact) on 251 records
, which

suggests that
“ground truth” annotation
on this task
itself is

very challenging
.

The evaluation resu
lts (Table 3) have consistently shown that all our systems performed better at
the system level than at the patient level
. The results indicate that Lancet performed relatively
better on discharge summaries which incorporate more medication events.

Figure
4B shows that
most discharge summaries(67%) incorporate 10
-
50 medication events and the number of
discharge summaries decreased as the number of medication events increase. In addition, we can
see from Figure 4C that the system performance achieved the hig
hest average F1 score on
discharge summaries that contain 90
-
100 medication events.
As shown in
Figure 4B and C
,
the
more medication events in a discharge summary, the better Lancet performed. We speculate that
th
at neighboring
medication names in discharg
e summaries

are useful features for Lancet
.


[Figure
4

about here]


The
results

for
list and narrative entries (Section VI
-
A)

showed that the Lancet system performed
consistently better
on

list
s

than
on
narrative
s
,
a result that it shares with all the

part
icipat
ing

Li,
a Medica
tion Event Extraction System


18

systems
. The results are not surprising because in the list format,
a
medication and its related
fields
are
highly structured. In contrast, narrative
s

incorporate complex syntactic and semantic
structures
that pose

a challenge for detecting medic
ation events.

One thing we want to point out is that the lower performance on narrative entries
does not
suggest that this system is restricted to
dealing only
with structured text.

The concepts of list and
narrative were not clearly defined by
this

i
2b2

challenge
,
and
we observed that many medication
entries annotated as “list” in the gold standard incorporated
text belong
ing

to “narrative” in a
broad sense. For example, “
COUMADIN with target inr of 2.0 , last target 1.6 , then received 10
MG in evening x

2.
” is annotated as “list

,

despite

being clearly more difficult than other
structured cases.

We can see from Table 3

that

the Lancet system significantly outperformed
the
rule
-
based

system jMerki
,

increasing the
precision
from 81.3% to 90.4%
and
the
rec
all from
58.7
%
to
66.1%
respectively
(horizontal system level with exact match).
It

suggests that machine learning
based methods hold advantages for capturing patterns automatically and accurately over the rule
-
based
jMerki
system on this task.

In addition
, o
ur Lancet system based on this learning
framework does not rely on expensive manually
-
curated rules.


Our error analysis
provides evidence of
the challenges in
terms of
data sparseness,

multiple
medication entries, misspelling
and negation
, which partia
lly explained the relatively low recall
of our system
.

Data sparseness is a common problem for any supervised machine
-
learning systems. Although
supervised machine
-
learning systems can be robust, as the systems learn from multiple features,
our results cle
arly demonstrate that data sparseness has contributed to errors

in our system’s
performance
.
One of our

post
-
hoc experiments
show
s

that performance increased if we
incorporated dictionaries as additional features (
from 76.4% to 78.5%
, as
shown in Table

4
,
p <
0.005
).

Data

sparseness can additionally explain why the hybrid system ha
d

improved
performance (from
76.4%

to
79%
, as

shown in Table 3
, p

<

0.005
).
A
s we have more unlabeled
data available, semi
-
supervised Conditional Random Fields
learning

could be u
sed to improve
the performance
.

[27]

Li,
a Medica
tion Event Extraction System


19

M
ultiple medication entries
, which we showed
earlier
accounted for 5%
of
medication events,
was an additional source of errors.
We have to remove the heuristic rule
that is currently
used in
the

Lancet system and allow
multiple instances for each medication field of an event.
But t
his
will
also
introduce new noise

in the form

of false positives
,

and some filtering strategy need
s

to
be employed

for our system to benefit from this change
.
In addition,
w
e speculate that lin
guistic
and rule
-
based approaches may be explored to improve the detection of multiple medication
events,
but these must remain for
future research.

Our error analysis has concluded that some errors were caused by the m
isspelling

of medication
name
s.
In fu
ture
work
,

we will explore

an

automatic misspelling detection and correction tool,
for example,

the

Aspell
system

(
http://aspell.net/
).

We have

attempted in

several

cases

to

show
that Lancet’s output can be corrected aft
er the misspelling was removed.

Our error analysis has also shown that negation contributed errors. However, i
n our post
-
hoc
experiments,
we found that
w
hen we
manually labeled negative medication

events, the
performance
gain
ed

in

precision but degraded
in

the F1 score
.
We speculate that th
is

paradox c
an

be explained by
the fact that
many
negat
ed event
s

lack useful context
s

for

learn
ing

and
that
negative medication
s

var
y

from patient to patient
,

which will confuse the learning model
,
particularly

on a small

set of training data.

In addition, the negation detection system we built
had not yet been evaluated, and
in
future work
,

we will explore state
-
of
-
the
-
art negation systems,
including NegEx
,
[28]

to improve negation detection.

As shown in Table
3
, the error

sources discussed above had a more severe influence on the
performance of duration and reason field.

We can see that our annotation on those two fields
achieved the lowest

agreement (F1

score
36
% and
33
%
, as

discussed

in
the
error analysis
section). Even
the annotation by
the
i2b2 organizers had a

much
lower agreement on duration
(
F1 score
61
%
) and reason (
F1 score
68
%
)

than other fields
. We thus speculate that the poor
performance on the two categories (duration and reason)
is

partially due to the lower a
greement
that can potentially be obtained

in annotation. In addition, another reason might be because of
fewer annotation instances
fo
r

these fields
than
for
other fields

(only 223 and 209 instances vs.
over 2000 instances for other fields)
. Because the La
ncet system was built upon supervised
machine
-
learning method, its performance is more sensitive and dependent on the consistency

Li,
a Medica
tion Event Extraction System


20

and coverage

in annotated training data.

Furthermore, we observed that compared to other fields,
“reason” is more flexible in
the medication event in that
it may contain multiple instances,
involve different
written styles

and

occur anywhere

in the proximity of

the medication name
,
which also makes automatic recognition more challenging.

Despite the different sources of errors me
ntioned earlier, the Lancet system performed with the
highest precision among the top ten teams. We found that most of the top ten systems

[29
-
37]


incorporated extensively manually curated patterns and external dictionaries. In contrast, the
Lancet system

was trained only with the annotated dataset and applied few manually curated
rules and no external knowledge resources. We therefore speculate that noise introduced by the
external resources or rules may hurt precision. On the other hand, we found in our
experimentation that a high
-
quality external dictionary increased both recall and precision. More
investigation is needed as different approaches and systems by different teams are made
available in the future.

O
ur

post
-
hoc experiments

showed that

affix fe
atures based
on
WHO nomenclature rules did

n
o
t
help
with the F score
although
it
increased precision. We speculate that
although
those
medication

names

related affixes

can provide useful evidence

for better precision, it
might
also
introduce some noise

in
cases in which

those affixes are shared by common words
.
However,
digit

normalization
slightly improved
the system performance

(from 76.4% to 76.6% as shown in
Table
4
, p < 0.005),
which indicates that
normalizing digits

can to some extent reduce data
spar
seness, but data sparseness
due to

digits is not dominant.
In another experiment, instead of
considering one single line as a sequence in the training, we converted the whole

article
into
a
single line

which brought some performance gain

(from 76.4% to 77.
8% shown in Table
4
,
p=0.11) as well. This can be explained
in
that article
-
level sequence
s

can
help
catch more useful
dependency information

for learning
.

VIII.
CONCLUSION AND FURTH
ER DIRECTIONS

We have presented
three systems

for

medication
event
extract
ion

from patient discharge
summaries
: the supervised machine
-
learning system Lancet, the rule
-
based system jMerki, and
the hybrid system.
We applied Lancet

to the i2b2 medication event extraction challenge
,

and the
Li,
a Medica
tion Event Extraction System


21

evaluation results showed
that
it perform
ed with the highest precision (90.4% and 94.0% F1
scores in exact and inexact match) among the top ten teams.

Our post
-
hoc experiments show that Lancet and jMerki ha
ve

different strength
s

and that the
hybrid system has the best performance,
yielding
a
79
%
F1 score

(85% precision and 74%
recall)
. Our error analysis has shown that the source of errors were introduced in part by
inconsistency in annotation and data sparseness
,

and we therefore speculated that a large scale of
high quality annotated data may
fu
rther

improve the Lancet system’s performance.
Another line
of future work is to explore
semi
-
supervised conditional random fields
learning
, with the hope of
making full use of a
large amount

of unlabeled data to further boost the system’s performance.

Our

current Lancet system incorporates minimum parsing and little external knowledge
resources, yet it achieved the best precision among
the
top ten teams.
The automatic learning
framework

also provides a great potential for generalization

given appropriate a
mount of training
data
.

We speculate that deeper syntactic and semantic parsing may
help improve
the performance

further
.

Li,
a Medica
tion Event Extraction System


22

ACKNOWLEDGEMENTS


We acknowledge the following grant support: 5R01LM009836,5R21RR024933, and
5U54DA021519. We also thank Qing Zhang a
nd

Shashank Agarwal for valuable discussion.

Li,
a Medica
tion Event Extraction System


23

REFERENCES

1. Gold S, Elhadad N, Zhu X, Cimino JJ, Hripcsak G:
Extracting structured medication event
information from discharge summaries
.
AMIA Annu Symp Proc

2008, :237
-
41.


2. Diaz E, Levine HB, Sullivan MC,

Sernyak MJ, Hawkins KA, Cramer JA, Woods SW:
Use of
the Medication Event Monitoring System to estimate medication compliance in patients with
schizophrenia
.
J Psychiatry Neurosci

2001,
26
:325
-
329.


3. de Klerk E, van der Heijde D, Landewé R, van der Tempe
l H, van der Linden S:
The
compliance
-
questionnaire
-
rheumatology compared with electronic medication event monitoring:
a validation study.

The Journal of Rheumatology

2003,
30
:2469
-
2475VL
-

30.


4. Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny J
C:
MedEx: a medication
information extraction system for clinical narratives
.
J Am Med Inform Assoc

2010,
17
:19
-
24.


5. Mullins IM, Siadaty MS, Lyman J, Scully K, Garrett CT, Miller WG, Muller R, Robson B,
Apte C, Weiss S, Rigoutsos I, Platt D, Cohen S, Kn
aus WA:
Data mining and clinical data
repositories: Insights from a 667,000 patient data set
.
Comput. Biol. Med

2006,
36
:1351
-
137710.1016/j.compbiomed.2005.08.003.


6. Kuperman GJ, Bobb A, Payne TH, Avery AJ, Gandhi TK, Burns G, Classen DC, Bates DW:
Medic
ation
-
related clinical decision support in computerized provider order entry systems: a
review
.
J Am Med Inform Assoc

2007,
14
:29
-
4010.1197/jamia.M2170.


7. Bates DW, Cohen M, Leape LL, Overhage JM, Shabot MM, Sheridan T:
Reducing the
frequency of errors i
n medicine using information technology
.
J Am Med Inform Assoc

2001,
8
:299
-
308.


8. Anderson JG, Jay SJ, Anderson M, Hunt TJ:
Evaluating the Impact of Information
Technology on Medication Errors: A Simulation
.
J Am Med Inform Assoc

2003,
10
:292
-
29310.1197/
jamia.M1297.


9. Jha AK, Kuperman GJ, Teich JM, Leape L, Shea B, Rittenberg E, Burdick E, Seger DL,
Vander Vliet M, Bates DW:
Identifying adverse drug events: development of a computer
-
based
monitor and comparison with chart review and stimulated voluntary

report
.
J Am Med Inform
Assoc

1998,
5
:305
-
314.


10. Pronovost P, Weast B, Schwarz M, Wyskiel RM, Prow D, Milanovich SN, Berenholtz S,
Dorman T, Lipsett P:
Medication reconciliation: a practical tool to reduce the risk of medication
errors
.
J Crit Care

200
3,
18
:201
-
5.


11. Sager N, Lyman M, Nhan NT, Tick LJ:
Medical language processing: applications to patient
data representation and automatic encoding.

Methods of information in medicine

1995,
34
:140.


Li,
a Medica
tion Event Extraction System


24

12. Friedman C, Alderson PO, Austin JH, Cimino JJ, John
son SB:
A general natural
-
language
text processor for clinical radiology.

J Am Med Inform Assoc

1994,
1
:161
-
174.


13. Evans DA, Brownlow ND, Hersh WR, Campbell EM:
Automating concept identification in
the electronic medical record: an experiment in extract
ing dosage information
.
Proc AMIA Annu
Fall Symp

1996, :388
-
92.


14. Bramsen P, Deshpande P, Lee YK, Barzilay R:
Finding temporal order in discharge
summaries
.
AMIA Annu Symp Proc

2006, :81
-
85.


15. Uzuner O, Goldstein I, Luo Y, Kohane I:
Identifying patie
nt smoking status from medical
discharge records
.
J Am Med Inform Assoc

2008,
15
:14
-
2410.1197/jamia.M2408.


16. Cimino JJ, Bright TJ, Li J:
Medication reconciliation using natural language processing and
controlled terminologies
.
Stud Health Technol Inform

2007,
129
:679
-
83.


17. Jagannathan V, Mullett CJ, Arbogast JG, Halbritter KA, Yellapragada D, Regulapati S,
Bandaru P:
Assessment of commercial NLP engines for medication information extraction from
dictated clinical notes
.
Int J Med Inform

2009,
78
:284
-
9
1.


18. Uzuner Ö, Luo Y, Szolovits P:
Evaluating the State
-
of
-
the
-
Art in Automatic De
-
identification
.
Journal of the American Medical Informatics Association

2007,
14
:550
-
563.


19.
i2b2 NLP Rearch Data Sets

[https://www.i2b2.org/NLP/DataSets/Main.php Acces
sed
06/20/2010]


20. Gillick D:
Sentence Boundary Detection and the Problem with the US
. In
Proceedings of
Human Language Technologies: The 2009 Annual Conference of the North American Chapter of
the Association for Computational Linguistics, Companion Vol
ume: Short Papers

2009:241

244.


21. Settles B:
ABNER: an open source tool for automatically tagging genes, proteins and other
entity names in text
.
Bioinformatics

2005,
21
:3191
-
2.


22. Frank E, Hall M, Trigg L, Holmes G, Witten IH:
Data mining in bioinfor
matics using Weka
.
Bioinformatics

2004,
20
:2479
-
248110.1093/bioinformatics/bth261.


23. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M:
DrugBank: a knowledgebase for drugs, drug actions and drug targets
.
Nucleic Acids Res

2008,
36
:D901
-
6.


24. Hersh WR, Bhupatiraju RT, Ross L, Roberts P, Cohen AM, Kraemer DF:
Enhancing access
to the Bibliome: the TREC 2004 Genomics Track
.
J Biomed Discov Collab

2006,
1
:310.1186/1747
-
5333
-
1
-
3.


Li,
a Medica
tion Event Extraction System


25

25. Uzuner Ö, Solti I, Cadag E:
Extracting Med
ication Information from Clinical Text
.
Journal
of American Medical Informatics Association
,
in current issue
.


26.
Linux.words

[http://www.ibiblio.org/pub/linux/libs/linux.words.2.lsm. Accessed 02/12/2010]


27. Dietterich TG, Hao G, Ashenfelter A:
Gradien
t tree boosting for training conditional random
fields
.
Journal of Machine Learning Research

2008,
9
:2113

2139.


28. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG:
A simple algorithm for
identifying negated findings and diseases in discharge s
ummaries
.
J Biomed Inform

2001,
34
:301
-
31010.1006/jbin.2001.1029.


29. Doan S, Bastarache L, Klimkowski S, Denny JC, Xu H:
Vanderbilt’s System for Medication
Extraction
. In 2009.


30. Grouin C, Deleger L, Zweigenbaum P:
A Simple Rule
-
based Medication Extra
ction System
.
In
Third i2b2 Shared
-
Task Workshop Proceedings

2009.


31. Hamon T, Grabar N:
Concurrent linguistic annotations for identifying medication names and
the related information in discharge summaries
. In
Third i2b2 Shared
-
Task Workshop
Proceedings

2009.


32. Meystre SM, Thibault J, Shen S, Hurdle JF, South BR:
Description of the Textractor System
for Medications and Reason for their Prescription Extraction from Clinical Narrative Text
Documents
. In
Third i2b2 Shared
-
Task Workshop Proceedings

2009.


33. Patrick J, Li M:
A Cascade Approach to Extract Medication Event (i2b2 challenge 2009)
. In
Third i2b2 Shared
-
Task Workshop Proceedings

2009.


34. Shooshan SE, Aronson AR, Mork JG, Bodenreider O, Demner
-
Fushman D, Dogan RI, Lang
F, Lu Z, Neveol A, Peter
s L:
NLM’s I2b2 Tool System Description
. In
Third i2b2 Shared
-
Task
Workshop Proceedings

2009.


35. Solt I, Tikk D:
Yet another rule
-
based approach for extracting medication information from
discharge summaries
. In
Third i2b2 Shared
-
Task Workshop Proceeding
s

2009.


36. Spasic I, Sarafraz F, Keane JA, Nenadic G:
Medication Information Extraction with

Linguistic Pattern Matching and Semantic Rules
. In
Third i2b2 Shared
-
Task Workshop
Proceedings

2009.


37. Yang H:
A Linguistic Approach for Medication Extraction

from Medical Discharge

Summaries
. In
Third i2b2 Shared
-
Task Workshop Proceedings

2009.


Li,
a Medica
tion Event Extraction System


26

Table 1 Definitions of medication name and associated fields


Fields

Definition

Medication

Substances for which the patient is the experiencer, excluding food,
water
, diet, tobacco, alcohol, illicit drugs, and allergic reaction
related drugs.

Dosage

The amount of a single medication used in each administration.

Mode/route

Expressions describing the method for administering the
medication.

Frequency

Terms, phrases,
or abbreviations that describe how often each dose
of the medication should be taken.

Duration

Expressions that indicate for how long the medication is to be
administered.

Reason

The medical reason for which the medication is stated to be given.

Li,
a Medica
tion Event Extraction System


27

Table
2 Features for the medication relationship model


Feature Name

Meaning

Same sentence

Whether the medication and field are both in the same sentence
,
as determined by

Splitta
.

Same
sub
section

W
hether both
elements

in a medication field pair are located in

the same
sub
section of the discharge summary.

Numeral

Whether the value of the medication field contains numerals.

Distance

T
he number of tokens between a medication name and
medication field.

P
osition

W
hether the medication field appears before or af
ter the
medication name.

F
ield type

T
he type of field,
such as

duration, reason, etc.

Medication
between

T
he number of other medication name
s

between the
pair
.

Li,
a Medica
tion Event Extraction System


28

Table
3

Three systems’ comparison results

(
F1 score,
exact
match)
.

Significant
outperformanc
e
is

indicated by

*
(
p<

0.05, Wilcoxon rank sum test
)
.


Two Levels

Granularity

Tags

jMerki

Lancet
*

Hybrid
*

Horizontal

System

Medication event

68.2%

76
.
4
%

79.0
%

Patient

Medication event

67.2%

74
.
2
%

77
.
6
%


System

Medication name

77
.
2
%

80
.
2
%

83
.
4
%


Patie
nt

Medication name

76.6%

79
.
1
%

82
.
9
%


System

Dosage

67.9%

80
.
2
%

81
.
8
%


Patient

Dosage

66%

78
.
3
%

80
.
6
%


System

Mode

70.8%

82
.
1
%

85
.
0
%

Vertical

Patient

Mode

68.2%

74
%

81
.
9
%


System

Frequency

66.3%

81
.
3
%

82
.
4
%


Patient

Frequency

63%

78
.
8
%

8
0%


System

D
uration

8.9%

18
%

21
.
2
%


Patient

Duration

5.6%

14
%

16
.
5
%


System

Reason

0


P
B

2
.
9
%


Patient

Reason

0


O
.
4
B

2
.
4
%

†, caused by programming bug.

Li,
a Medica
tion Event Extraction System


29

Table
4

Post
-
hoc experimental results

(Horizontal system level
, exact match
)

Significant
outperformance is ind
icated by

* (
p<

0.05, Wilcoxon rank sum test) compared with Lancet.



Precision

Recall

F1

Lancet

90.4%

66.1%

76.4%

NegPlus
*

90.5%

61.8%

73.4%

Digital
Normalization

90.2%

66.5%

76.6%

Affix

91.2%

62.6%

74.2%

Dictionaries

91.0%

69.1%

78.5%

Single
-
line

9
2.5%

67.1%

77.8%

Li,
a Medica
tion Event Extraction System


30

FIGURES CAPTIONS


Figure 1 Illustration of medication events in both a narrative and a list. As shown here, each
event includes a medication name and any of its related medication fields.

Medical
-
field
associations are indicated by a dotted line with an arrow. Different font styles indicate different
fields: bold plus underline for medication name; italic for dosage; underline for mode/route;
italic plus bold for frequency; bold for duratio
n; and italic plus underline for reason. The bracket
pair “[ ]” shows the narrative/list attribute.


Figure 2 Flow chart of the Lancet system


Figure 3
Precision
for system
-
level horizontal evaluation of top ten systems: A) Strict evaluation
with exact mat
ch; B) Relaxed evaluation with inexact match; C) Strict evaluation with exact
match on list entries only.
Dash line indicates the average of the top ten systems.


Figure 4 Analysis of performance variance among discharge summaries