Natural Language Processing

blabbingunequaledAI and Robotics

Oct 24, 2013 (5 years and 3 months ago)


Natural Language Processing
Components of Communication
− seven processes:
a) speaker
i) intention − speaker intends to say something to hearer −
may need reasoning about the hearer’s beliefs + goals to
ensure that the message has the desired effect
ii) generation − speaker uses knowledge about language in
selecting the words to express message
iii) synthesis − speaker utters the words
b) hearer
i) perception − hearer perceives the message from speaker
(message may be noisy)
ii) analysis − syntactic interpretation (parsing) + semantic
interpretation (understanding the meaning of words)
iii) disambiguation − hearer uses reasoning (based on
background + language knowledge) to determine what is the
most likely message from speaker
iv) incorporation − hearer decides how to use knowledge
from message
Agents that Communicate
1) Using Tell and Ask
− agents share same internal representation
− agents have direct access to each other’s knowledge base (TELL and
− two types of symbols used:
a) static − meaning well established before exploration is started
b) dynamic − meaning has to be established after exploration has
2) Using Formal Language
− requires:
1) parsing routine
2) semantic analysis routine
3) disambiguate routine
Syntactic Analysis
Basic Parsing Techniques
− to examine the syntactic structure of a sentence one needs to consider:
a) grammar − formal specification of the language
b) parsing technique − method to analyse a sentence based on the
− can use context−free grammars or recursive transition networks
− most common approach to represent the structure of a sentence is to use
a treelike structure that outlines how the sentence is broken into major
subparts and, how these subparts are broken up in turn.
− for example:
Steven read the textbook.
− the sentence is made up of an initial noun phrase and a verb
− the noun phrase is simple (Steven) but the verb phrase is made up
of a verb (read) and noun phrase (which consists of the article the
and the noun textbook)
Context Free Grammars
− grammar − defines the rules that specify which sentences are valid (or
correct) for a given language
− also defines the tree structures that are valid for the language
− grammar rules for tree representation of the example (Steven read the
book) are:
a) sentence <= noun phrase verb phrase
b) verb phrase <= verb noun phrase
c) noun phrase <= name
d) noun phrase <= article noun
− general format of context grammar rule is:
<symbol> <= <symbol>1 .... <symbol>n for n>= 1
− context free grammars − define most structures in natural languages +
allow sentence analysis
− uses two types of symbols:
a) terminal − cannot be decomposed into more primitive
b) non−terminal − can be decomposed into more primitive
− what makes a good grammar?
 generality − range of sentences analyzed correctly
 selectivity − range of non−sentences it identifies as problematic
 understandability − simplicity of grammar
− use tests based constituents (subparts of sentences)
a) conjunction test
− only the same type of constituents can be used to construct new
I read |the textbook| and |the notes|.
Sam will |eat the chips| and |throw away the fish|.
b) insertion test
− proposed constituent can be inserted in other sentences that take the
same type of constituents
John’s hitting of Mary
used in
John’s hitting of Mary alarmed Sue
can used in other sentences as NP
Sue was alarmed by John’s hitting of Mary [from Allen 1987]
− parsing can be done in two ways:
a) top−down
− start with the most general symbol
− use rewrite rules to rewrite symbol into more more
primitive symbols
− last change − substitute the actual word for terminal
− description becomes more and more detailed with each
b) bottom−up
− start with actual words
− replace words with their respective syntactical categories
− description becomes more and more general with each step
X−Bar Schema
− many treelike representations can be used for sentence analysis
− most use different structures for different phrases − analysis may be
difficult to do automatically
− goal: universal rule for all phrases + all languanges
− X−BAR schema − all phrases have the same structure
− representation uses binary trees
a) three types of nodes X, X−bar, XP
b) each node has at most two branches
c) leaves are words
d) in English
− specifiers enter XP from one side
− complements enter XP from opposite side
− the direction is language specific
e) the kind of phrase that can serve as specifier for a particular
kind of XP or complement for a particular kind of X−bar depends
on the occupant of the head position , X [from Winston,1992]
Recursive Transition Networks
− based on simple transition network formalism
− consists of nodes and labeled arcs (each arc is labeled with a word
− for example:
noun phrase <= article noun phrase
noun phrase1 <= adjective noun phrase1
noun phrase1 <= noun
− simple transition networks are not sufficient to represent natural
language constructs
− recursive transition networks − allows arcs that refer to other networks
rather than word categories
− types of arcs:
a) push arcs
b) cat arcs
c) jump arcs
− parsing methods
a) top−down
b) bottom−up
− parsing NL − special case of the search problem
Semantic Analysis
Ambiguity and Disambiguation
− language can be vague
− problem − message sent can have several valid interpretations
− ambiguity − several forms:
1) lexical ambiguity
2) syntactic ambiguity
3) referential ambiguity
4) pragmatic ambiguity
− agent that receives the messages needs to select the correct
− agent can use one of four models to resolve the ambiguity problem:
1) world model
2) mental model
3) language model
4) acoustic model