Natural Language Processing Project Topics

huntcopywriterAI and Robotics

Oct 24, 2013 (4 years and 19 days ago)

78 views

Natural Language Processing


Project Topics




You may choose
your

project proposal

from the following list or you may suggest any
other
project

in NLP field.

Each student will select a separate project.




You should pick your
project

as soon as possible. Y
ou should write one


page doc
ument for
your project proposal
, and

submit

your
project proposal

before

21

March

201
3


(
by email

and hard copy
).

o

You
r project

should also
include

a computational work.



In your project proposal, you should talk about your comp
utational work.



You may talk with me about what are my expectations for the possible projects
described in this document
.




You should find at least two
-
three major papers in your project topic and read them.



At the last two weeks of the semester, you wil
l

make a

present
ation

about

your project in
the class.




At the end of semester, you will submit
your final project report

(
by email and hard copy
)
.

By email, send the pdf file and your original files (doc or latex files) of your final



Prepare your final
project report in the format of a conference article using IEEE double
column format (6
-
10 pages).

Your final project paper will be extended and updated
version of your intermediate report.



In your final project report, you should use your own words. Do no
t cut and paste from
the papers that you read.



Your final project report should contain at least the following sections in addition to a
title

and an
abstract
:

Introduction



Describing the problem that you are attacking

Related Work



Describe the rela
ted works here, and describe the relations with your
work.

Sections describing your computational work in detail



Describe

the details of your
computational work in these sections
.

If your computational work needs evaluation,
do not forget to include eva
luation sections.

Conclusion


Give your concluding remarks and

possible future works in this section.

References


Give the references that are cited in your paper.




With your
final
project
report
, you should
send

a
n electronic copy
of each of the
major
p
apers that you read in your survey
.




You
should

send your
executable and source files of your project

at the end of the
semester. Make sure that I can execute your project on my PC.




You should
make a
demo

of your project

to me at the end of semester

bef
ore you give
final project report
.



Possible project topics:




Anaphora

Resolution for Turkish



Finding the correct third person singular (he/she/it) for a Turkish third person singular
morpheme.



It will take the lexical level representation of Turkish s
entences (lexical level
representation
s

of words are produced by a Turkish morphological analyzer


PCKIMMO PC version). Then, it will try to find what Turkish third person singular
morphemes (and pronouns) in the sentences refer (English pronouns: he/she/
it or a
singular noun [a singular noun phrase]).



Examples:


Ali geldi

Ali gel+
Verb+
Past
+3Sg



+3Sg refers to Ali in the same sentence.


Ali Ankara’ya dün geldi. Ama bugün okula gelmedi.

Ali Ankara’ya dün gel+
Verb+
Past
+3Sg
.

Ama bugün okula gel+
Verb+
Neg+Past
+
3Sg
.



First +3Sg refers to Ali in the same sentence.



Second +3Sg refers to Ali in the previous sentence.



A

Morphological Disambiguator

for Turkish



A word can have different part of speech tags (such as noun, verb, …), but its usage in
a sentence will b
e only one of them. For example, English word “fly” can be verb
(uçmak) or noun (sinek). In the sentence “A
fly

can
fly
”, the first “fly” is a noun and
the second “fly” is a verb. A part of speech tagger tries to determine the intended part
of speech tag f
or each word in a sentence.



Your part of speech tagger should invoke Turkish morphological analyzer (which is
available) to find possible part of speech tags of each word, and should try to find the
correct part of speech tag of each



This can be an improve
ment to our rule
-
based morphological disambiguator or a new
statistical morphological disambiguator.



A Morphological Analyzer for Turkish



It must be written using Java, and it should work similar to our PCKIMMO system.



Text Categorization



Each written
document can be categorized according to its content. For example, the
category of a newspaper article can be econmy, sport, etc. A text categorization system
determines the category of a given document.


Author Identification

Determining the author of a g
iven text.

Parsing
Internet
Domain Names to Collect InformationAbout Them



A domain name is a string of words. It can be a single word or a sequence of words.
Sometimes the words appearing in domain names can be misspelled or cannot be a
regular word.



I
n this project, you are expected to develop a domain name parser to finds the words
appearing in a given domain name. For example, when “possibleprojecttopics” string is
given as a domain name, this program should be able to find twe words “possible”,
“pro
ject”, and “topics”.



This system should also find the part of speech tags of the found words. Fore example,
“possible” is an adjective, “project” is a singular noun, and “topics” is a plural noun.
with the help of an available part of speech tagger.



This s
ystem should also check whether the given name is a correct noun phrase or not.
If it is a correct noun phrase, it should determine the type of that noun phrase.



A stemmer for Turkish




A stemmer takes a word and
returns the root

(stem) of the word.



Some

Turkish words and their stems.

kalemin



kalem


çiçeğimin



çiçek

kitabım


kitap

alnım


alın

uygarlaştım


uygarlaş (or uygar ; or both of them


I prefer the last case)



A
Parser for
Turkish (link parsing,
dependency

parsing)



It should parse a given Turkish sentence using dependency parsing forma
lism (or link
grammar formalism).

You have to rewrite linking requirements for Turkish grammar.



A NP
-
chunker for Turkish



It should find NPs (noun phrases) a given Turkish sentence.



For example, NPs in the following sentence are underlined.

Kırmızı başlı
klı kız

Ankara’dan

İstanbul’a

uçakla

gitti.



The system does not need the parse the whole sentence. It should find only the noun
phrases in the sentence.



Development of an efficient version of the Early Parser for a grammar for machine
translation




The grammar has the rules in a special form (I will give the grammar to you).



You have to write a new form of Earley parser so that it can work efficiently for this
grammar in the special form.







Finding sema
ntic similarities between words
for Turkish
(or English)



Th
e system should categories the
Turkish
words (nouns and verbs)

according to their
semantic similarities
usin
g

a corpus.

a)

Using Latent Semantic Analysis (LSA)

b)

Using other methods different than LSA



Extraction of protein interaction from bio
medical texts



Extraction of the relations between the protein names in the biomedical texts.



Keyphrase Extraction for Turkish



Generation of keypharases of given texts.



Keyphrase Extraction for English



Generation of keypharases of given texts.



T
ext Summarization

for Turkish



Generation of summaries of given texts by selecting the important sentences of the
given texts.

a)

Using Latent Semantic Analysis (LSA)

b)

Using other methods different than LSA



Text Summarization for English




Generation of summa
ries of given texts by selecting the important sentences of the
given texts.

a)

Using Latent Semantic Analysis (LSA)

b)

Using other methods different than LSA


[1] Text summarization using a trainable summarizer and latent semantic analysis Jen
-
Yuah
Yen Informat
ion Processign & Management Journal 2005



A factoid question answering system



It will find the answer of a question on the web.



Information Extraction from Movie Subtitles



Extraction of information from subtitles to index movies



Indexing of

a Video
Database
(
or an Image Database
) from the text documents associated
with that database




Example: indexing of MPEG7 video files from tags in the file or indexing the videos
using the closed captioned texts associated with videos.



Word Sense Disambiguation
.



Lexical Chains method or some other method can be used



You may use SemCor [1] in this project.



You may use WordNet


[1]

http://www.cs.unt.edu/~rada/downloads.html#semcor


[2]

http://wn
-
similarity.sourceforge.net/


[3]

www.ergin.altintas.org/index.php?download=paper
-
nodalida2005
-
altintas.pdf


(Nodalida 2005)

[4] P. Bhattacharyya and Narayan Unny, Word Sense Disambiguation and

Text Similarity
Measurement Using WordNet , chapter in Real World

Semanic Web Applications, IOS
Press, Amsterdam, 2002.



Semantic Relation Similarity.



Yo
u may implement Turney’s algorithm or its variation.


[1] Turney, P.D. (2006), Similarity of semantic relations,

Computational Linguistics, 32



Opinion Mining


Deciding the polarity (positive or negative) of views in customer review texts or texts in
soc
ial media environments. This can be done for Turkish or English texts


Spell Checker


Implemntation of spell checker algorithms.


Grammar Checker


Implemntation of grammar checker algorithms.