Selected Bibliography for RANLP 2003 TutorialLearning in NLP: When can we reduce or avoid annotation cost?

randombroadAI and Robotics

Oct 15, 2013 (3 years and 9 months ago)

117 views

Selected
Bibliography for RANLP 2003 Tutorial
:

Learning in NLP: When can we reduce or avoid annotation cost?

Ido Dagan, Bar Ilan University


Main text book
:

(
referred as MS below)

Foundations of Statistical Natural Language Processing

by Christopher

D. Ma
nning and
Hinrich Schütze, MIT Press,
1999 (Second printing with corrections


2000).

The home
page for the book
, including a continuously updated
errata page

which is important to
follow
:

http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=3391
.


Online access to ACL publications:

ACL anthology at ACL website.


References by topics:




POS tagging

by
HMM
s
:
covered in MS
Chapter
s 9 &
10.



Supervised classification for p
repositional phrase attachment

o

MS Section 8.3

o

Hindle, Donald and Mats Rooth. 1993. Structural ambiguity and lexical
relations.
Computation
al Linguistics
19:103
-
120.



Decision lists for accent restoration:

o

Yarowsky, David. 1994. Decision lists for lexical a
mbiguity resolution
:
application to accent restoration in Spanish and French. In proceedings of
ACL 32
nd

Annual Meeting.



Word sense disambiguation (for machine translation):

o

Dagan, Ido and Alon Itai. Word sense disambiguation using a second language
monoli
ngual corpus,
Computational Linguistics
, 1994, Vol. 20(4), pp. 563
-
596.



Generalization via distributional similarity

o

Distributional similarity models: Clustering vs. nearest neighbors. Lillian
Lee and Fernando Pereira.
Proceedings of the 37th ACL
, pp 33
--
40, 1999.

o

Dagan, Ido, Lillian Lee and Fernando Pereira. Similarity
-
based models of
cooccurrence probabilities,
Machine Learning
, 1999, Vol.

34(1
-
3) special
issue on Natural Language Learning, pp. 43
-
69.



Word sense disambiguation using Bayes

o

MS Section 7.
2.1.


o

Gale, William, Kenneth Church and David Yarowsky. 1992. A method for
disambiguating word senses in a large corpus.
Computers and the
Humanities

26:415
-
439.



Selective sampling

o

Argamon
-
Engleson, Shlomo and Ido Dagan. Committee
-
Based Sample
Selection fo
r Probabilistic Classifiers,
Journal of Artificial Intelligence
Research (JAIR)
, 1999, Vol. 11, pp. 335
-
360.



Bootstrapping

o

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

(1995). David Yarowsky. Proceedings of ACL
-
95. pp. 189
-
196.

o

Unsupervised Models for Named Entity Classification

(1999). Michael
Collins, Yoram Singer. In Proceedings of the Joint SIGDAT Conference
on Empirical Methods in Natural Language Processing and

Very Large
Corpora.

o

Steven Abney. Bootsrapping.
Proceedings of ACL 2002
, 360
-
367.

o

Cheng Niu, Wei Li, Jihong Ding, Rohini K. Srihari
.

A Bootstrapping
Approach to Named Entity Classification Using Successive

Learners.
Proc. of
ACL 2003.

o

Yunbo Cao, Hang Li,

and Li Lian, Uncertainty Reduction in Collaborative
Bootstrapping: Measure and Algorithm,
Proc. of ACL'03
, 327
-
334.



Clustering:

o

MS
Chapter

14.

o

Hang Li, Word Clustering and Disambiguation based on Co
-
occurrence
Data, Natural Language Engineering, 8(1), 25
-
42, (2002).


Additional References



B. Magnini, C. Strapparava, G. Peuzzulo and A. Gliozzo.
The role of domain
information in word sense disambiguation.
Natural Language Engineering
,
8(4):359
-
373, 2002.



Learning paraphrases: papers (and their reference lis
t)
at
The Second
International Workshop on Paraphrasing: Paraphrase Acquisition and
Applications

(at ACL 2003).



Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Text
Classificatio
n from Labeled and Unlabeled Documents using EM.
Machine
Learning
, 39(2/3). pp. 103
-
134. 2000.