Annotated bibliography Hohyon Ryu Andrade, M. A., and Bork, P ...

richessewoozyBiotechnology

Oct 1, 2013 (4 years and 1 month ago)

146 views

Annotated bibliography


Hohyon Ryu



1.

Andrade, M. A., and Bork, P (2000)
,

“Automated extraction of informat
i
on in molecular
biology”
,

FEBS Lett 476
,
pp.
12
-
17.


In the medical field, initial attempts to extract keyphrases were done by extracting keywords
and
keyphrases from an abstract

using natural language processing
. T
h
is paper reviews
data mining
techniques in molecular biology, specifically those that extract information from the scientific
literature itself.

In this review paper, the automation of t
h
e process of keyword extraction
is
mainly discussed to find relevant
information from the literature.




2.

Berrios, D. C. (2000)
,


A
utomated indexing for full text information retrieval”
,

AMIA
,

pp.

71
-
75.


Instead of indexing single terms,
this paper

claim
s

that concept
-
based indexing and domain
knowledge improve full text information retrieval in a health information search.

This paper
reports a
method of generating sentence
-
level indexing
(instead of term level)
based on identified
UMLS concepts and query
and vector
-
space models

improve the quality of indexing.



3.

Blaschke, C., Hirschman, L., and Valencia, A. (2002)
,

“Information extraction in molecular
biology”
,

Brief Bioinform
,
Vol.

3
,
pp.
154
-
165.


Keyword and keyphrase extraction is
a subfield
of information extraction.
This paper reviews the
methods and topics in information extraction in molecular biology. Traditional focus of the field
has been
the detection of protein

protein interactions and the analysis of DNA expression arrays
.
T
h
is paper

suggests the new focus area such as
document retrieval, protein functional description,
and detection of disease
-
related genes
. And it also tries to crystallise those developments into
general agreement on a set of standard evaluation criteria
.
In this re
view
they
introduce the
general
field of information extraction
,

outline the status of the applications in molecular biology,
and discuss some ideas about possible standards for evaluation that are needed for the future
development of the field.


4.

de Bruijn
, B., and Martin, J. (2002)
,

“Literature mining in molecular biology”
,

Proceedings
of the EFMI Workshop on Natural Language: Processing in Biomedical Applications
,
Vol.

32
,

pp. 1
-
5.


This paper reviews literature mining in molecular biology.
It reviews
man
y studies
that
have
resulted in computer programs to extract various molecular biology findings
.

This article
describes the range of techniques that have been applied in literature mining.
I
t divides
literature
mining
into four general sub
fields
: text cate
gorization, named entity tagging, fact extraction and
collection
-
wide analysis.
It focuses on
the domain particularities

of molecular biology.


5.

Dung, N. T. (2007)
,


Automatic Keyphrase Generation”
,

National University of
Singaopore
,
Singapore
.


This study utilizes
the Hidden Markov model (HMM)
to extract keywords and keyphrases. The
HMM based model outperformed KEA algorithm. This paper also utilizes Maximum Entropy
and
Naïve

Bayes model.


6.

Hersh
, W. R., Hickam, D. H., Haynes, R. B., and McKibbon,

K. A. (1994
a
)
,

“A
performance and failure analysis of SAPHIRE with a MEDLINE test collection”
,

Journal
of the American Medical Informatics Association
,
Vol.

1

No.
1
,
pp.
51
-
60.


This study a
ssess
es

the performance of the SAPHIRE automated information

retrieval system.
It
colludes

that t
he current version of the

Metathesaurus, as utilized by SAPHIRE, was unable to
represent the

conceptual content of one
-
fourth of physician
-
generated MEDLINE queries.

The
most likely cause for retrieval of nonrelevant art
icles was the

presence of some or all of the
search terms in the article, with

frequencies high enough to lead to retrieval. There were
significant variations in performance when SAPHIRE's

concept
-
weighing formulas were
modified.

The implication of this st
udy to my current project is that fact that it uses
met
a
thesaurus

for
information

retrieval
.




7.

Hersh
, W., Buckley, C., Leones, T., and Hickam, D. (1994
b
)
,

“OHSUMED: An interactive
retrieval evaluation and new large test collection for research”
,

Proceedin
gs of the 17th
Annual International ACM Special Interest Group in Information Retrieval
, pp. 192
-
201.


This paper introduces OHSUMED test collection that plays a pivotal role in medical information
retrieval evaluation.

As the result of this experiment, a
large medical test collection is created
and it is widely used to evaluate the performance of medical search engines.



8.

Huang, Y., Lowe, H. J., Klein, D., and Cucina, R. J. (2005)
,

“Improved identification of
noun phrases in clinical radiology reports using a high
-
performance statistical natural
language parser augmented with the UMLS Specialist Lexicon”
,

Journal of the American
Medical Informatics Association,
Vol.
12

No.
3
,
pp.
27
5
-
285.


This study

integrated UMLS into the keyword extraction system.

It also focused on extracting
noun phrases with full phrase structures

using
natural language processing (NLP)
.

UMLS was
utilized to improve
noun phrase identification

within medical do
cuments.
The

controlled
vocabulary increased keyword extraction performance in terms of precision.

The overall
maximal NPI precision and recall were 78.9% and 81.5% before using the UMLS Specialist
Lexicon and 82.1% and 84.6% after. The overall base NPI pr
ecision and recall were 88.2% and
86.8% before using the UMLS Specialist Lexicon and 93.1% and 92.6% after, reducing false
-
positives by 31.1% and false
-
negatives by 34.3%.



9.

Jo, T., Lee, M., and Gatton, T. M. (2006)
,

“Keyword extraction from documents usin
g a
neural network model”
,

International Conference on Hybrid Information Technology
.


This study
successfully integrated
the

neural network model into a keyword/keyphrase extraction
system.

As my system
utilizes

the neural networks

to extract keywords and keyphrase, this study
is an essential reference that can help me build the system.
This paper proposes a neural network
back propagation model in which these factors are used as the features and feature vectors are
generated to sel
ect keywords. This paper show
s

that the neural network backpropagation
approach outperforms
other keyphrase extraction methods
.


10.

Krallinger, M., and Valencia, A. (2005)
,

“Text
-
mining and information
-
retrieval services
for molecular biology”
,

Genome Biology
,
Vol.

6

No.
7
.


Due to the exponentially growing amount of health information that is electronically available,
enhancing the performance of the health information retrieval technology has been stu
died
throughout the last decade
. This paper reviews the he
alth information retrieval technologies of
last decade.



11.

Krulwich, B., and Burkey, C. (1996)
,

“Learning user information interests through the
extraction of semantically significant phrases”
,

AAAI 1996 Spring Symposium on Machine
Learning in Information A
ccess
,

California.


This study

approache
s

keyphrase extraction using heuristics. Their approach is based on
syntactic clues and the use of acronyms.

This traditional way of extraction has been
outperformed by extraction methods that utilized machine learning algorithms.



12.

Lindberg, D., Humphreays, B., and McCray, A. (1993)
,

“The Unified Medical Language
System”
,

Methods Inf Med
,
pp.
281
-
291.


This pap
er introduced t
he Unified Medical Language System (UMLS)

in the first place. This
study
was presented by the U.S. National Library of Medicine (NLM)
.
UMLS
is a set of
controlled vocabularies, including Medical Subject Headings (MeSH).

T
he purpose of the
UM
LS is to improve the ability of computer programs to "understand" the biomedical meaning
in user inquiries and to use this understanding to retrieve and integrate relevant machine
-
readable
information for users. Underlying the UMLS effort is the assumption

that timely access to
accurate and up
-
to
-
date information will improve decision making and ultimately the quality of
patient care and research.



13.

Mao, W., and Chu, W. W. (2002)
,

“Free
-
text Medical document retrieval via phrase
-
based
vector space model”
,

A
MIA Annual Symposium Proceedings
,

Vol. 59 No. 7,

pp. 489
-
493.


This study
augmented baseline indexing with a phrase
-
based vector space model. In their study,
a phrase is used as an index unit.
A phrase consists of multiple concepts and word stems. The
simi
larity between two phrases is jointly determined by their conceptual similarity and their
common word stems. The document similarity can in turn be derived from phrase similarities.
Using OHSUMED as a test collection and UMLS as the knowledge source,
this
experiment

results
show
that phrase
-
based VSM yields a 16% increase of retrieval accuracy compared to the
stem
-
based model.


14.

Medelyan, O., and Witten, I. H. (2008)
,

“Domain independent automatic keyphrase
indexing with small training sets”
,

Journal of Amer
ican Society for Information Science
and Technology
,
pp.
1026
-
1040.


This study
claimed that the Naïve Bayes algorithm displays a very efficient keyword extraction
performance. Using four distinct features: TF*IDF, the position of the first occurrence of a

term,
the length of the candidate phrase, and the node degree, Medelyan and Witten showed that
thousands of patterns can be effectively extracted through training.



15.

Névéol, A., Shooshan, S. E., Mork, J. G., and Aronson, A. R. (2007)
,

“Fine
-
grained
indexing of the biomedical literature: MeSH subheading attachment for a MEDLINE
indexing tool”
,

AMIA Annual Symposium Proceedings
, pp. 553

557.



This paper
presented a method to augment the index of biomedical literature by integrating
MeSH
. Their approach includes natural language processing, post
-
processing and dictionary
-
based MeSH recommendations. They found that an augmented index based on medical domain
knowledge improved information retrieval performance.



16.

Purcell, G. P., and Shortl
iffe, E. H. (1995)
,

“Contextual models of clinical publications for
enhancing retireval from full
-
text database”
,
AMIA
,
pp.
851
-
857.


Tools and augmented indexing were also explored in many studies.
This study

devised Context
-
Based Searching Tools to impro
ve medical information retrieval performance. Using the
hierarchical list of the concepts in a document, they improved the precision of full
-
text searching
from 4.5% to 33.3% without compromising recall.


17.

Salton, G. (1968)
,

“A comparison between manual and

automatic indexing methods”
,

Cornell Univeristy
,
Ithaca, NY.


This is one of the most prominent classic
literatures

of information retrieval.
One focal point
of
this study is
building a well
-
designed

index.
In this study
, indexing has been done by observi
ng
the statistical features of the data corpus terms
.

It was an initial step of the automatic indexing,
and this paper discuss on the effectiveness of automatic indexing which was a hot issue for the
contemporaries.



18.

Salton, G. (1991)
,

“Developments in au
tomatic text retrieval”
,

Science,

Vol. 253, No. 5023
,
pp.
974
-
980.


This paper reviews the history of text retrieval technology. I will cite this paper to address details
on
Vector Space Model (VSM)
.

In VSM,
the index terms are weighted based on the term
f
requency (TF), inverted document frequency (IDF), and additional features including the length
of each document.



19.

Shah, P. K., Perez
-
Iratxeta, C., Bork, P., and Andrade, M. A. (2003)
,

“Information
extraction from full text scientific articles: Where are
the keywords?”
,

BMC Bioinformatics,
Vol
.
4

No.
20
.


This paper

argued that keywords should be extracted from the full text of a document instead of
the abstract.
T
h
is information is important as I can give
different

weight on each part of a
document.
They
concluded that the extraction from the full text is more reliable since the
introduction and discussion section are also good sources for keywords.


20.

Turney, P. D. (2000)
,


L
earning algorithms for keyphrase extraction”
,

Information
Retrieval
,

Vol.

2

No.
4
,
pp.
303
-
336.


This paper compares different learning algorithms for keyphrase extraction. This study especially
adapted supervised machine learning algorithms for keyphrase extraction.
It
utilized decision
trees and genetic algorithms to extract keyphrases
.

This study shows different machine learning
algorithms I could use in my experiment.



21.

Zhang, K., Xu, H., Tang, J., and Li, J. (2006)
,

“Keyword extraction using support vector
maching”
,

In
WAIM 2006, LNCS 4016,

Berlin and Heidelburg, Springer
-
Verlag, pp.

85
-
96.


This study utilizes
Support Vector Machine (SVM)
for keyword extraction.

Experimental results
indicate that the proposed SVM based method can significantly outperform the baseline methods
for keyword extraction. The proposed method has been applie
d to document classification, a
typical text mining processing. Experimental results show that the accuracy of document
classification can be significantly improved by using the keyword extraction method.

T
h
is
conclusion conform the result of my
experiment.