Appendix - Nature

randombroadAI and Robotics

Oct 15, 2013 (3 years and 7 months ago)

45 views

Appendix, page
1

Appendix

Appendix

................................
................................
................................
................................
.....................

1

Datasets

................................
................................
................................
................................
..............

1

Detailed Methods

................................
................................
................................
............................

1

Relation to previous work

................................
................................
................................
.............

4

Figures

................................
................................
.................

Error! Bookmark not defined.

Re
ferences

................................
................................
................................
................................
.........

5


Datasets

All four
synopses

were formed using a seeding Pub
M
ed search (Appendix
Figure 1) that returned

publications up to their inception date; thereafter a
maintenance search strategy

was used. The difference bet
ween the seeding and
maintenance search is that the former is more specific whereas the latter is more
sensitive. This is for practical reasons: using the maintenance search strategy during
the database inception would r
etrieve
an unmanageably large number

of articles to
be screened.
When
continuously monitor
ing

the literature
, however, it is feasible to
use a
very sensitive strategy. The maintenance
search strategy has remained
constant for all four databases.


Detailed Methods


We view the citation screen
ing problem as a specific instance of the
text classification

problem, the aim in which is to induce a model capable of automatically categorizing text
Appendix, page
2

documents into one of
k

categories.
Spam

filters
for e
-
mail

are common examples of text
classifiers
.

The
se

classifiers automatically designate incoming e
-
mails as spam or not.


To use a text classification model, one must first encode documents into
a
vector
space,
in order for it

to be intelligible to the classification model. We make use of the
standard B
ag
-
of
-
Words (BoW) encoding scheme, in which each document is mapped to a
vector whose
i
th entry is 1 if word
i
is present in the document and 0 otherwise. We map
the titles, abstracts and MESH terms of a given document into separate BoW
representations (th
e latter might be called a Bag
-
of
-
Mesh terms representation). These
mappings are referred to as
feature
-
spaces
.


For

our base classifier, we use the Support Vector Machine (SVM
)
.
1

SVMs have been
shown empirically to be particularly adept at text classific
ati
on
.
2

Briefly, SVMs work by
finding a hyperplane (i.e., high
-
dimensional generalization of a line) that separates
instances (documents) from the respective classes with the maximum possible
margin
.


The intuition
behind the SVM approach
is illustrated i
n Appendix Figure 2, which
shows a
simplified fictitious

2
-
dimensional classification problem comprising two
classes: the plusses (+) and the minuses (
-
). There are an infinite number of lines that
separate the classes, one of which is shown in the left
-
ha
nd side of the figure, for
example. The intuition behind the SVM is that it selects the separating line in a principled
way. In particular, it chooses the line
t
hat maximizes the distance between

the

nearest
members of the respective classes and
itself
; th
is is referred to as the margin. In light of
this strategy, the separating line the SVM would select in our example is shown on the
right hand side of the figure. In practice, this line is found by solving an objective
function expressing th
is

max
-
margin p
rinciple.

Appendix, page
3


There are a few properties that make the citation classification problem unique from a
data mining/machine learning

perspective. First, there is severe class imbalance, i.e. there
are far fewer relevant than irrelevant citations.

This can pose
problems for machine
learning algorithms.

Second, false negatives (
relevant
citations misclassified as
irrelevant
) are costlier than are false positives (
irrelevant
citations misclassified as
relevant
), and we therefore want to emphasize sensitivity (recal
l) over specificity.
Accordingly, we have tuned our model to this end. In particular, we first build an
ensemble of three classifiers, one per each of the three

aforementioned feature spaces.


Each of these classifiers is trained with a modified SVM object
ive function that
emphasizes sensitivity by penalizing the model less heavily for misclassifying negative
(irrelevant) examples during training. When a prediction for a given document is made,
we follow a simple disjunction rule to aggregate predictions; i
f
any

of the three classifiers
predicts
relevant
, then we predict that the document is
relevant



only when there is
unanimous agreement that the document is
irrelevant

do we exclude it.
To further
increase sensitivity, we build a committee of these ensemb
les to reduce the variance
caused by the sampling scheme we use.


More specifically, we
under
sample

the majority class of
irrelevant

instances before
training the classifiers, i.e. we discard citations designated as
irrelevant
by the reviewer at
random unt
il the number of
irrelevant
documents in the training set is equal to the
number of
relevant
documents. This simple strategy for dealing with class imbalance has
been shown empirically to work well, particularly in terms of improving the induced
classifier
s’ sensitivit
y
.
4
-
7

Because this introduces
randomness
(the particular majority
Appendix, page
4

instances that are discarded are selected at random), we build a committee of these
classif
iers and take a majority vote to

mak
e

a final prediction.


This

ensemble

strategy is
known as
bagging
7,8

and is an extension of bootstrapping
to the case of predictive models.
8

Bagging reduces the variance of predictors. We have
found bagging classifiers induced over balanced bootstrap samples works well in the case
of class imbalance, co
nsistent with previous evidence that this strategy is effective
,
4,5

and
have proposed an explanation based on a probabilistic argument.
10

Specific
ally, for our
task we induce an ensemble of 11

base classifiers
over corresponding balanced (i.e.,
undersample
d) bootstrap samples from the original training data. The final classification
decision is taken as a maj
ority vote over this committee.

We chose an odd number (n=11)
to break ties. The exact number is arbitrary, but has worked well in previous work.
4,5


E
ach
base classifier

itself is an aggregate prediction (an OR

Boolean operator
; i.e., each
base classifier

predicts that a document is relevant if any of its members does) over three
separ
ate classifiers induced, respectively, over different feature
-
space r
epresentations
(
titles, abstracts,
MeSH terms)

of the documents included in independently drawn
balanced bootstrap samples. For a schematic depiction of this, see

the Figure

in the
main
manuscript.


Relation to previous work


There has been a good deal of

research in the machine learning and medical
informatics communities investigating techniques for semi
-
automating citation
screening
.
4
-
7,

11
-
18

These works have largely been ‘proof
-
of
-
concept’ endeavors that have
explored the feasibility of automatic clas
sification for the citation

screening task (with
promising results). By contrast, th
e present

work looks to apply our existing classification
Appendix, page
5

methodology prospectively, to demonstrate its utility in reducing the burden on reviewers
updating existing system
atic reviews.


Most similar to our work here, Cohen
et al
. conducted a prospective evaluation of a
classification system for supporting the systematic review process
.
14

Rather than semi
-
automating the screening process, the authors advocated using
data mi
ning

for work
prioritization. More specifically, they induced a model to rank the retrieved set of
potentially relevant citations in order of their likelihood of being included in the review.
In this way, reviewers would screen the citations that are most
likely to be included first,
thereby discovering the relevant literature earlier in the review process than they would
have had they been screening the citations in a random order. Note that in this scenario,
reviewers still ultimately screen all of the re
trieved citations. This differs from our aim
here, as we prospectively evaluate a system that automatically excludes irrelevant
literature; i.e., reduces the number of abstracts the reviewers must screen for a systematic
review.

Reference
s

1.

Vapnik VN. The n
ature of statistical learning theory. Springer Verlag; 2000.

2.

Joachims, T. Text categorization with support vector machines: Learning with
many relevant features. EC
ML , 137
-
142. 1998. Springer.

3.

Van Hulse, J., Khoshgoftaar, T. M., and Napolitano, A. Exper
imental
perspectives on learning from imbalanced data. 935
-
942. 2007. ACM.

4.

Small KM, Wallace BC, Brodley CE, Trikalinos TA. The constrained weight
space SVM: Learning with labeled features. International Conference on
Machine Learning (ICML). 2011.

5.

Wall
ace BC, Small KM, Brodley CE, Trikalinos TA. Active learning for
biomedical citation screening. Knowledge Discovery and Data Mining (KDD).
2010.

Appendix, page
6

6.

Wallace BC, Small KM, Brodley CE, Trikalinos TA. Modeling annotation time
to reduce workload in comparative eff
ectiveness reviews. Proc ACM
International Health Informatics Symposium (IHI). 2010.

7.

Wallace BC, Small KM, Brodley CE, Trikalinos TA. Who should label what?
Instance allocation in multiple expert active learning. Proc SIAM
International Conference on Data
Mining. 2011.

8.

Breiman, L. Bagging Predicto
rs. Journal of Machine Learning
, 123
-
140. 1996.

9.

Kang P,

Cho

S. EUS SVMs: ensemble of under
-
sampled SVMs for data
imbalance problems. Neural Information Pro
cessing (NIPS)
, 837
-
846. 2006.

10.

Wallace

BC
, Small

K
, Brodle
y

CE,
Trikalinos

TA
. Class Imbalance, Redux. In
Proc. of the International Conference on Data Mining (ICDM), 2011.

11.

Bekhuis T, Demner
-
Fushman D. Towards automating the initial screening phase
of a systematic review. Stud Health Technol Inform. 2010;160

(Pt
1):146
-
50.

12.

Cohen AM, Hersh WR, Peterson K, Yen PY. Reducing workload in systematic
review preparation using automated citation classification. J Am Me
d Inform
Assoc. 2006;13:206
-
219.

13.

Cohen AM, Ambert K, McDonagh M. Cross
-
topic learning for work prioritizat
ion
in systematic review creation and update. J Am Med Inform Assoc. 2009;
Erratum
in: J Am Med Inform Assoc. 2009
;16(6):898.

14.

Cohen AM, Ambert K, McDonagh M. A Prospective Evaluation of an Automated
Classification System to Support Evidence
-
based Medicine
and Systematic
Review. AMIA Annu Symp Proc; 2010
.

15.

Frunza O, Inkpen D, Matwin S, Klement W, O'Blenis P. Exploiting the systematic

review protocol for classification of medical abstracts. Artif Intell Med.
2011;51(1):17
-
25.

16.

Matwin S, Kouznetsov A, Inkpen D,
Frunza O, O'Blenis P. A new algorithm for
reducing the workload of experts in performing systematic reviews. J Am Med
Inform Assoc. 2010
;17(4):446
-
53
.

17.

Polavarapu N, Navathe SB, Ramnarayanan R, ul Haque A, Sahay S, Liu Y.
Investigation into biomedical liter
ature classification using support vector
machines. Proc IEEE Comput Syst Bioinform Conf. 2005:366
-
74.

18.

Yu W, Clyne M, Dolan SM, Yesupriya A, Wulf A, Liu T, Khoury MJ, Gwinn M.
GAPscreener: an automatic tool for screening human genetic association literatur
e
in PubMed using the support vector machine technique. BMC Bioinformatics.
2008;9:205.