Andrade, M. A., and Bork, P (2000)
“Automated extraction of informat
on in molecular
FEBS Lett 476
In the medical field, initial attempts to extract keyphrases were done by extracting keywords
keyphrases from an abstract
using natural language processing
is paper reviews
techniques in molecular biology, specifically those that extract information from the scientific
In this review paper, the automation of t
e process of keyword extraction
mainly discussed to find relevant
information from the literature.
Berrios, D. C. (2000)
utomated indexing for full text information retrieval”
Instead of indexing single terms,
based indexing and domain
knowledge improve full text information retrieval in a health information search.
method of generating sentence
(instead of term level)
based on identified
UMLS concepts and query
improve the quality of indexing.
Blaschke, C., Hirschman, L., and Valencia, A. (2002)
“Information extraction in molecular
Keyword and keyphrase extraction is
of information extraction.
This paper reviews the
methods and topics in information extraction in molecular biology. Traditional focus of the field
the detection of protein
protein interactions and the analysis of DNA expression arrays
suggests the new focus area such as
document retrieval, protein functional description,
and detection of disease
. And it also tries to crystallise those developments into
general agreement on a set of standard evaluation criteria
In this re
field of information extraction
outline the status of the applications in molecular biology,
and discuss some ideas about possible standards for evaluation that are needed for the future
development of the field.
, B., and Martin, J. (2002)
“Literature mining in molecular biology”
of the EFMI Workshop on Natural Language: Processing in Biomedical Applications
This paper reviews literature mining in molecular biology.
resulted in computer programs to extract various molecular biology findings
describes the range of techniques that have been applied in literature mining.
into four general sub
: text cate
gorization, named entity tagging, fact extraction and
It focuses on
the domain particularities
of molecular biology.
Dung, N. T. (2007)
Automatic Keyphrase Generation”
National University of
This study utilizes
the Hidden Markov model (HMM)
to extract keywords and keyphrases. The
HMM based model outperformed KEA algorithm. This paper also utilizes Maximum Entropy
, W. R., Hickam, D. H., Haynes, R. B., and McKibbon,
K. A. (1994
performance and failure analysis of SAPHIRE with a MEDLINE test collection”
of the American Medical Informatics Association
This study a
the performance of the SAPHIRE automated information
he current version of the
Metathesaurus, as utilized by SAPHIRE, was unable to
conceptual content of one
fourth of physician
generated MEDLINE queries.
most likely cause for retrieval of nonrelevant art
icles was the
presence of some or all of the
search terms in the article, with
frequencies high enough to lead to retrieval. There were
significant variations in performance when SAPHIRE's
weighing formulas were
The implication of this st
udy to my current project is that fact that it uses
, W., Buckley, C., Leones, T., and Hickam, D. (1994
“OHSUMED: An interactive
retrieval evaluation and new large test collection for research”
gs of the 17th
Annual International ACM Special Interest Group in Information Retrieval
, pp. 192
This paper introduces OHSUMED test collection that plays a pivotal role in medical information
As the result of this experiment, a
large medical test collection is created
and it is widely used to evaluate the performance of medical search engines.
Huang, Y., Lowe, H. J., Klein, D., and Cucina, R. J. (2005)
“Improved identification of
noun phrases in clinical radiology reports using a high
performance statistical natural
language parser augmented with the UMLS Specialist Lexicon”
Journal of the American
Medical Informatics Association,
integrated UMLS into the keyword extraction system.
It also focused on extracting
noun phrases with full phrase structures
natural language processing (NLP)
utilized to improve
noun phrase identification
within medical do
vocabulary increased keyword extraction performance in terms of precision.
maximal NPI precision and recall were 78.9% and 81.5% before using the UMLS Specialist
Lexicon and 82.1% and 84.6% after. The overall base NPI pr
ecision and recall were 88.2% and
86.8% before using the UMLS Specialist Lexicon and 93.1% and 92.6% after, reducing false
positives by 31.1% and false
negatives by 34.3%.
Jo, T., Lee, M., and Gatton, T. M. (2006)
“Keyword extraction from documents usin
neural network model”
International Conference on Hybrid Information Technology
neural network model into a keyword/keyphrase extraction
As my system
the neural networks
to extract keywords and keyphrase, this study
is an essential reference that can help me build the system.
This paper proposes a neural network
back propagation model in which these factors are used as the features and feature vectors are
generated to sel
ect keywords. This paper show
that the neural network backpropagation
other keyphrase extraction methods
Krallinger, M., and Valencia, A. (2005)
mining and information
for molecular biology”
Due to the exponentially growing amount of health information that is electronically available,
enhancing the performance of the health information retrieval technology has been stu
throughout the last decade
. This paper reviews the he
alth information retrieval technologies of
Krulwich, B., and Burkey, C. (1996)
“Learning user information interests through the
extraction of semantically significant phrases”
AAAI 1996 Spring Symposium on Machine
Learning in Information A
keyphrase extraction using heuristics. Their approach is based on
syntactic clues and the use of acronyms.
This traditional way of extraction has been
outperformed by extraction methods that utilized machine learning algorithms.
Lindberg, D., Humphreays, B., and McCray, A. (1993)
“The Unified Medical Language
Methods Inf Med
er introduced t
he Unified Medical Language System (UMLS)
in the first place. This
was presented by the U.S. National Library of Medicine (NLM)
is a set of
controlled vocabularies, including Medical Subject Headings (MeSH).
he purpose of the
LS is to improve the ability of computer programs to "understand" the biomedical meaning
in user inquiries and to use this understanding to retrieve and integrate relevant machine
information for users. Underlying the UMLS effort is the assumption
that timely access to
accurate and up
date information will improve decision making and ultimately the quality of
patient care and research.
Mao, W., and Chu, W. W. (2002)
text Medical document retrieval via phrase
vector space model”
MIA Annual Symposium Proceedings
Vol. 59 No. 7,
augmented baseline indexing with a phrase
based vector space model. In their study,
a phrase is used as an index unit.
A phrase consists of multiple concepts and word stems. The
larity between two phrases is jointly determined by their conceptual similarity and their
common word stems. The document similarity can in turn be derived from phrase similarities.
Using OHSUMED as a test collection and UMLS as the knowledge source,
based VSM yields a 16% increase of retrieval accuracy compared to the
Medelyan, O., and Witten, I. H. (2008)
“Domain independent automatic keyphrase
indexing with small training sets”
Journal of Amer
ican Society for Information Science
claimed that the Naïve Bayes algorithm displays a very efficient keyword extraction
performance. Using four distinct features: TF*IDF, the position of the first occurrence of a
the length of the candidate phrase, and the node degree, Medelyan and Witten showed that
thousands of patterns can be effectively extracted through training.
Névéol, A., Shooshan, S. E., Mork, J. G., and Aronson, A. R. (2007)
indexing of the biomedical literature: MeSH subheading attachment for a MEDLINE
AMIA Annual Symposium Proceedings
, pp. 553
presented a method to augment the index of biomedical literature by integrating
. Their approach includes natural language processing, post
processing and dictionary
based MeSH recommendations. They found that an augmented index based on medical domain
knowledge improved information retrieval performance.
Purcell, G. P., and Shortl
iffe, E. H. (1995)
“Contextual models of clinical publications for
enhancing retireval from full
Tools and augmented indexing were also explored in many studies.
Based Searching Tools to impro
ve medical information retrieval performance. Using the
hierarchical list of the concepts in a document, they improved the precision of full
from 4.5% to 33.3% without compromising recall.
Salton, G. (1968)
“A comparison between manual and
automatic indexing methods”
This is one of the most prominent classic
of information retrieval.
One focal point
this study is
building a well
In this study
, indexing has been done by observi
the statistical features of the data corpus terms
It was an initial step of the automatic indexing,
and this paper discuss on the effectiveness of automatic indexing which was a hot issue for the
Salton, G. (1991)
“Developments in au
tomatic text retrieval”
Vol. 253, No. 5023
This paper reviews the history of text retrieval technology. I will cite this paper to address details
Vector Space Model (VSM)
the index terms are weighted based on the term
requency (TF), inverted document frequency (IDF), and additional features including the length
of each document.
Shah, P. K., Perez
Iratxeta, C., Bork, P., and Andrade, M. A. (2003)
extraction from full text scientific articles: Where are
argued that keywords should be extracted from the full text of a document instead of
is information is important as I can give
weight on each part of a
concluded that the extraction from the full text is more reliable since the
introduction and discussion section are also good sources for keywords.
Turney, P. D. (2000)
earning algorithms for keyphrase extraction”
This paper compares different learning algorithms for keyphrase extraction. This study especially
adapted supervised machine learning algorithms for keyphrase extraction.
trees and genetic algorithms to extract keyphrases
This study shows different machine learning
algorithms I could use in my experiment.
Zhang, K., Xu, H., Tang, J., and Li, J. (2006)
“Keyword extraction using support vector
WAIM 2006, LNCS 4016,
Berlin and Heidelburg, Springer
This study utilizes
Support Vector Machine (SVM)
for keyword extraction.
indicate that the proposed SVM based method can significantly outperform the baseline methods
for keyword extraction. The proposed method has been applie
d to document classification, a
typical text mining processing. Experimental results show that the accuracy of document
classification can be significantly improved by using the keyword extraction method.
conclusion conform the result of my