Today's Topic: Natural Language Processing

scarfpocketΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

88 εμφανίσεις

1/51
Today‘s Topic:
Natural Language Processing
and Understanding
Literature:
(1) Xuedong Huang et.al: Chapter 17
(2) A.Lavie excerpt “Robust Parsing approaches” from his thesis
(3) SOUP homepage: http://www.is.cs.cmu.edu/ISL.speech.parsing.soup.html
2/51
Content

Introduction to Natural Language Understanding
•What is it, What’s the purpose, Why is it hard?
•Integration of Knowledge
•Levels of Language Analysis
•Syntactic Processing, Semantic Processing
•Grammars
•Approaches to Robust Parsing
•Grammar specific Repair Heuristics
•Full Minimal Distance Parsers
•Skipping and Maximal Coverage Parsers
•Connectionist Parsers
•Automatic and Interactive Repair Approaches
•The SOUP Parser
•Philosophy, Parsing, Formalism, Characteristics,
Performance, Heuristics
3/51
NL Processing and Understanding
What is NLP/NLU? -A definition
•The process of computer analysis of input provided in a human
language and ...
•The conversion of this input into a useful form of representation
(for immediate or delayed action)
What is the purpose of NLP/NLU?
•Understand the user (I.e. perform intended action)
•Help the recognition process by constraining the search, NL as
knowledge source: syntax, semantics, discourse
Forms of Natural Language
•text/written language: newspaper articles, letters, manuals,
prose, email, etc.
•spoken language: read speech (I.e. radio/TV), spontaneous
speech, conversational, commands, …
4/51
Separating Language Tasks
•Processing written text -using lexical, syntactical, and semantic
knowledge about the language, as well as the required real world
information
•Processing spoken language -involves all of the above, plus the
challenges of speech recognition, and unique characteristics of
human speech
•Another dimension:
–Understanding = Analysis
–Generation = Synthesis
•For Machine Translation: both!
5/51
Natural Language Understanding
Why is NL Understanding so hard?
•Complexity/Abstractness of the target representation:
–In a database retrieval system -keyword for the search
–In a translation system -a symbolic representation of the
meaning of the sentence
•Ambiguity: natural language is extremely rich in form and
structure, andvery ambiguous-one input can mean many things:
–lexical (word level) ambiguity: e.g. the word “bank”
–syntactical ambiguity: different ways to parse the sentence
–Interpreting partial information -pronouns
–Contextual information
•Many different inputs can mean the same thing
•Noisy input (e.g. speech)
ÏTask Dependency: Understanding is by definition target dependent
6/51
What’s hard about this story?
John stopped at the donut store on his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
ExamplefromJason Eisner, seettp://www.cs.jhu.edu/~jason/465/
7/51
What’s hard about this story?
John stopped at the donutstore on his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
To get a donut (spare tire) for his car?
8/51
What’s hard about this story?
John stopped at the donut storeon his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
store where donuts shop? or is run by donuts?
or looks like a big donut? or made of donut?
or has an emptiness at its core?
9/51
What’s hard about this story?
I stopped smoking freshman year, but
John stoppedat the donut store on his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
10/51
What’s hard about this story?
John stopped at the donut store on his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
Describes where the store is? Or when he
stopped?
11/51
What’s hard about this story?
John stopped at the donut store on his way
homefrom work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
Well, actually, he stopped there from hunger
and exhaustion, not just from work.
12/51
What’s hard about this story?
John stopped at the donut store on his way
home from work. He thoughta coffee was
good every few hours. But it turned out to
be too expensive there.
At that moment, or habitually?
(Similarly:Mozart composed music.)
13/51
What’s hard about this story?
John stopped at the donut store on his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
That’s how often he thought it?
14/51
What’s hard about this story?
John stopped at the donut store on his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
But actually, a coffee only stays good for
about 10 minutes before it gets cold.
15/51
What’s hard about this story?
John stopped at the donut store on his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensive there.
Similarly: In America a woman has a baby
every 15 minutes. Our job is to find that
woman and stop her.
16/51
What’s hard about this story?
John stopped at the donut store on his way
home from work. He thought a coffee was
good every few hours. But itturned out to
be too expensive there.
the particular coffee that was good every few
hours? the donut store? the situation?
17/51
What’s hard about this story?
John stopped at the donut store on his way
home from work. He thought a coffee was
good every few hours. But it turned out to
be too expensivethere.
too expensive for what? what are we
supposed to conclude about what John did?
how do we connect “it” to “expensive”?
18/51
Natural Language Understanding
Why is NL Understanding so hard?
•Complexity/Abstractness of the target representation:
–In a database retrieval system -keyword for the search
–In a translation system -a symbolic representation of the
meaning of the sentence
•Ambiguity: natural language is extremely rich in form and
structure, andvery ambiguous-one input can mean many things:
–lexical (word level) ambiguity: e.g. the word “bank”
–syntactical ambiguity: different ways to parse the sentence
–Interpreting partial information -pronouns
–Contextual information
•Many different inputs can mean the same thing
•Noisy input (e.g. speech)
ÏTask Dependency: Understanding is by definition target dependent
19/51
Levels of Language
•Phonetics/phonology/morphology:what
words (or subwords) are we dealing with?
•Syntax:What phrases are we dealing with?
Which words modify one another?
•Semantics:What’s the literal meaning?
•Pragmatics:What should you conclude
from the fact that I said something? How
should you react?
20/51
Levels in Language Analysis
•Morphological Analysis -analysis of words into their linguistic
components
•Lexical Analysis -Determine the meaning of individual words, and
identifying non-word tokens (I.e. punctuation marks)
•Syntactic Analysis -Parsing: transform linear sequences of words
(sentences) into structures that show how they relate to each other
•Semantic Analysis -assign meanings to the structures created by
the syntactic analysis. Map words and structures to particular do-
main objects in a way consistent with our knowledge of the world
•Discourse Integration -capture the contextual effects that indivi-
dual sentences have on each other in determining their joint
meaning
•Pragmatic Analysis -Using more general knowledge about the
world/domain to modify the interpretation into it’s true meaning
21/51
Integration of Knowledge Sources
Loose vs tight coupling of SR and NL components
•Serial paradigm: SR ⌫N-Best ⌫NL
–independent modules, easy to modify, fast
–SR may prune away true best
–removal of prosody
•Integrated architecture:
–more complete knowledge brought to bear from the start
–computationally expensive
–word-incremental parsing
•Blackboard architecture: multiple independent processes (e.g
acoustic, lexical, syntactical)
post and retrieve intermediate results from common workspace
–benefit of modularity without ordering constraints
–complexity of control module
22/51
Syntactic Processing
•Parsing -converting a flat input sentence into a hierarchical
structure that corresponds to the units of meaning in the sentence
•Large variety of parsing formalisms and algorithms
•Most formalisms have two main components:
–A grammar-a declarative representation describing the
syntactic structure of sentences in the language in a succinct way
–A parser -an algorithm that analyzes the input and outputs a
structural representation of it (= a parse) which is consistent
with the grammar specs
•Context-free grammars (CFGs) serve as the nucleus of many of the
parsing mechanisms
•In most systems these CFGs are complemented by some additional
features that make the formalism more suitable to handle natural
language
23/51
Syntactic Grammars
•Nonterminal symbols are grammatical categories
•Independent from domain
•Semantics obtained through non-trivial transformation of parse-tree
•e.g. Auntie is coming to town
[sentence]
[noun-phrase][verb-phrase]
[noun][aux][verb][prep-phrase]
auntieiscoming
[prep] [noun]
totown
24/51
Semantic Analysis and Grammars
•Assigning meanings to the structures created by syntactic analysis
•Standard Methodology -Symbolic Representation
•Mapping words and structures to particular domain objects in a way
consistent with our knowledge of the world
•Semantic interpretation plays an important role in selecting among
competing syntactic analyses and weeding out “illogical” analyses
•Semantic Parsing -Parse the input directlyinto a semantic
representation using semantic grammars
•Semantic Grammars -describe the structure of input sentences
directly in terms of semantic concepts (Example: SOUP-parser)
–The set of concepts is highly task and domain dependent
–Relatively easy to develop for specific (narrow) domains
–Difficult to expand to large or multiple domains
–Particularly effective for handling spoken language input
25/51
Semantic Grammars (1)
•Nonterminal symbols are concepts
•Dependent from domain
•Semantics are simply read off the parse-tree
•e.g. Auntie is coming to town
[my_unavailability]
auntieiscomingtotown
26/51
Semantic Grammars (2)
•Terminals (words) and nonterminals (concepts) are freely mixed in
rules, e.g.
[starting-point] º
from [temporal] on
[temporal] º*on [d-o-w]
•e.g. from Tuesday on
[starting-point]
from[temporal]on
[d-o-w]
tuesday
27/51
Janus Grammars
•Semantic Grammar
•Grammar represented as recursive transition networks (RTNs)
•SOUP stochastic parser (probabilistic RTNs)
•Parse = path through RTNs that consumes input words
•Language covered:
Analysis
Generation
EnglishEnglish
GermanGerman
SpanishSpanish
JapaneseJapanese
EgyptianItalian Korean
Chinese
28/51
Grammar Formalism
•Context-free grammar rules of the form:
head ºbody
where
•head ::= nonterminal
•body ::= body-elt body | body-elt
•body-elt ::= token | *token | +token | *+token
•token ::= nonterminal |word
•nonterminal ::= concept | aux-nonterminal
29/51
Grammar Fragment
Grammars for {E,S,G,J}SST: about 150 concepts, 600 auxiliary
nonterminals, 800-1600 terminals (status 1998).
Example for concept [reject] in ESST
[reject]
(*[neg_babble] +NO *I_NEG_AUX)
([temporal] *just BE *just NO)
([temporal] *just NEG_BE looking good)
(not [temporal])
NOMAYBE
(nope)(maybe)
(no *good)(perhaps)
(not good)(...)
(*I_BE *very sorry)
(*THERE_BE no way)
(MAYBE not)
(*THAT_BE out of the question)
30/51
Example
Input: On Monday I’m busy
Parse:
[give_information]
[my_unavailability]
[temporal]Iam busy
[point]
on[d-o-w]
monday
Generation:
English = > I’m busy on Monday <
German = > Leider kann ich Montag nicht <
Spanish= > Yo estoy ocupado el lunes <
Italian= > Sono occupato il Iuned <
Japanese= > Getsuyoobi wa tsugoo ga tsukanai N desu ga <
31/51
Some Results on SST (status 1998)
Language Pairs
From - To
Translation
accuracy on
Transcripts
Translation
accuracy on Speech
(WA ~ 70%)
English – German88.3%60.5%
Spanish – English81.4%73.3%
German – English75.5%66.4%
Korean – Korean80.6%50.0%
32/51
Problems in Spoken Language Analysis
Spoken language is very different from written text:
–Disfluencies
–Effects of interaction: back-channeling, cut-off sentences, cross-talks
–Differences in notions of grammaticality
–Slurried/unintelligible speech
–Lack of punctuation or clearly marked sentence boundaries
–Speech recognition errors
Disfluencies in spontaneous speech:
•False beginning, filled pauses:
–um okay then yeah Monday at two is fine
•Repetition, filled pauses:
–uh I I need to meet with you next week
•Incomplete sentences:
–well I’ll be in New York next week and I was hoping <pause>
33/51
NLP and Statistical Classification
•Classify recognized text directly
•AT&T “How May I help You?” system:
–operator scenario
–“I want to ask about about my bill?”
→BILLING
–“What’s the number of Tampa Florida”
→INFORMATION
–“Can I charge this to my home phone”
→CREDIT CARD CHARGE
–“I want to call my mother”
→COLLECT CALL or INFORMATION
34/51
Interpretation as Classification
•For each example sentence associate class
•Use machine learning technique:
–bag of words approach
–“Salience” feature
–LSA (Latent Semantic Analysis)
35/51
Practical Issues
•They have *many* examples
•May identify two possible classes:
–specifically disambiguate
–“Do you want third party billing or make a collect call?”
•Words change over time:
–Some new billing plan might be introduced
–Need to update model to deal with such questions
•Some people don’t know what they want
•Can be used for command and control
–Bellegarda, ICSLP 2000
–Speech Commands for Apple MacOS
36/51
Robust Parsing
Despite these facts -still the Goals are:
•Accept input from casual user
•Interpret underlying meaning of input utterance, I.e understand user
•Ignore disfluencies, portions of input which are not important
•Graceful degradation in performance
•Real-time performance
Why is that hard?
•Conventional parsers are fragile
•Correct interpretation of ill-formed input requires lots of knowledge
•IF only limited knowledge is available: very large search space
Main approaches to robust parsing:
•Grammar-specific Repair Heuristics
•Full Minimal Distance Parsers
•Skipping and Maximal Coverage Parsers
•Connectionist Parsers
•Automatic and Interactive Repair Approaches
37/51
Grammar-specific Repair Heuristics
[Jensen, early ‘90], [Hobbs, 1991], [McDonald, 1993]
•First Pass -attempt to parse in full with standard grammar
•If fails -“repair” the parse using grammar-specific rules
•Depends on bottom-up Chart Parser that will leave the partial
results at end of first path
•Repair rules specify how to combine partial parses, or what
grammatical constraints to relax in second pass
•Repair heuristics are grammar-specific, not a general solution
•Not suited for speech -degradation usually not graceful
38/51
Full Minimal Distance Parsers
[Lehman, 1989], [Smith and Hipp, 1992], [Ramshaw, 1994]
•Find a grammatical sentence of minimum distance to input and an
analyze it instead
•Operations allowed usually include: insertions, deletions,
substitutions, and transpositions
•Various heuristics can be used for calculating the “distance”
–Penalty schemes for various operations
–Semantic importance of words can be taken into account
–Statistical models
•Meaning can be interactively confirmed with user
•Adaptive learning -grammar can be modified/augmented
incrementally
•Main problem: Full MDP infeasible with large practical NL
grammars
39/51
Skipping and Maximal Coverage Parsers
Phoenix [Ward, 1989], GLR* [Lavie, 1995], SOUP [Gavalda, 1998]
•A limited version of MDP, allowing mainly deletions (possibly also
substitutions)
•Search for a parse that obtains maximal coverage of input
•Particularly well suited to parsing speech -search for coherent
concepts
•Where are deletions allowed:
–anywhere, only between main concepts
•Additional heuristics to deal with high levels of ambiguity and
selection of the most appropriate analysis
•A more general solution to the problem, but crucial heuristics are
domain / grammar dependent
•Drawback: Are not as flexible as full MDP
40/51
Connectionist Parsers
[Jain, 1991], [Gorin, 1993], [Buo, 1996]
•Idea: Use Neural Networks to learn mappings between strings and
output structures
•Often combined with some symbolic computation to produce the
partial parses or the training corpus
•No need for explicit grammar development -grammar is learned
from examples
•Very robust and graceful degradation
•Works well for rather shallow representations
•Difficult to develop for deeper symbolic representations
41/51
Automatic and Interactive Repair Approaches
[Rose, 1997]
•Main idea: a two stage process:
–First do partial parsing (non robust or limited)
–Then do a repair stage using domain (semantic) knowledge
–Possibly confirm repair steps or select output via user interaction
•Better division of labor resources between stages
•Rose uses a Genetic Programming approach to learn how to
“repair”: construct a set of good output representation hypotheses
•Interaction with the user can narrow down the correct/best
hypotheses
•As powerful as MDP
•More efficient than MDP
•Can work for large practical grammars and applications
42/51
NLP and Statistical Classification
•Classify recognized text directly
•AT&T “How May I help You?” system:
–operator scenario:
–“I want to ask about about my bill?”
ºBILLING
–“What is the number of Tampa Florida?”
ºINFORMATION
–“Can I charge this to my home phone?”
ºCREDIT CARD CHARGE
–“I want to call my mother.”
ºCOLLECT CALL or INFORMATION
43/51
Interpretation as Classification
•For each example sentence associate class
•Use machine learning technique:
–bag of words approach
–“Salience” feature
–LSA (Latent Semantic Analysis)
44/51
Practical Issues
•They have *many* examples
•May identify two possible classes:
–specifically disambiguate
–“Do you want third party billing or make a collect call?”
•Words change over time:
–Some new billing plan may be introduced
–Need to update model to deal with such questions
•Some people don’t know what they want
•Can be used for command and control:
–Bellegarda, ICSLP2000
–Speech commands for Apple MacOS
45/51
The SOUP Parser -Philosophy
[Gavalda, 1998]
•Designed for analysis of spoken language
–I.e. robust to multi-sentence utterances
–robust to ungrammaticalities, disfluencies, misrecognitions
•with very large, multi-domain, context-free semantic grammars
–support of real-world grammar development
•team effort
•dynamic domain model
•grammar modularization
•grammar sharing
•in real-time.
•Inspired by Ward’s PHOENIX parser
46/51
The SOUP Parser -Parsing
Assignment of structure to a sentence of words according to a
grammar.
E.g. parse for Are you free on Thursday morning?
[request_information]
[your_availability]
are you free[time]
on[point]
[day_of_week][time_of_day]
thursdaymorning
47/51
The SOUP Parser -Formalism
•Left-hand side (LHS, nonterminals, rule heads)
–top-level vs non-top level
–principal vs auxiliary
–look-up vs non-look-up
–character-level vs word-level
•Right-hand side (rule bodies)
–terminals and nonterminals freely mixed
–operators +, *, and *+
–wildcard _$any$_
•Parser output
–given a grammar and an input utterance (sequence of words) the parser
outputs a ranked list of interpretations
–interpretation: sequence of non-overlapping parse-trees
–parse-tree: path through the PRTNs, starting at a top-level nonterminal and
covering a contiguous segment of the input utterance
48/51
•Stochastic
–Probabilistic context-free grammar (PCFG) encoded as a collection of
recursive transition networks (RTNs) with arc probabilities
–E.g. rule [farewell] º(*good +bye) represented as:
–Advantage: search heuristics, generation of data for language modeling
•Chart-based
–Dynamic 3-dimensional matrix of parse DAGs indexed by sub-DAG root
nonterminal ID, sub-DAG start position, and sub-DAG end position
•Top-down
–From starting symbol of the grammar down to matching of terminals
The SOUP Parser -Characteristics
good
bye
bye
bye
0.6
0.4
1.0
0.2
0.8
49/51
The SOUP Parser -Performance
Parsing of a test of of 609 utterances (average of 9.18 words/utt) with
English Scheduling (SST) and Scheduling+Travel
(SST+GTR+HLT+TPT+EVT) grammars, on a 333-MHz Pentium
II running Linux:
SSTSST++
nonterminals598 (21 top,464 aux)5471 (318 top, 843 aux)
terminals8298242
rules287219060
nodes634939662
arcs1043772378
memory5 Mb60Mb
average parse time6.58ms/utt53.8ms/utt
max parse time56ms/utt492ms/utt
50/51
The SOUP Parser -Heuristics
•Maximize coverage
•Minimize number of parse trees
•Minimize number of parse tree nodes
•Minimize the number of wild-card matches
•Maximize probability of parse trees as paths along grammar arcs
•Maximize probability of sequence of top-level nonterminals (S)
given sequence of words (W):
–P(S|W) = [P(W|S) * P(S)] / P(W)
–find S as argmax(P(W|S) * P(S)) = argmax(P(W|T) * P(T) * P(S|T))
–P(W|T): distribution of word sequences given domain sequence
–P(T): distribution of domain sequences
–P(S|T): distribution of top-level NT sequences given domain sequences
•SOUP homepage: http://www.is.cs.cmu.edu/ISL.parsing.soup.html
51/51
Summary

Introduction to Natural Language Understanding
•What is it, What’s the purpose, Why is it hard?
•Integration of Knowledge
•Levels of Language Analysis
•Syntactic Processing, Semantic Processing
•Grammars
•Approaches to Robust Parsing
•Grammar specific Repair Heuristics
•Full Minimal Distance Parsers
•Skipping and Maximal Coverage Parsers
•Connectionist Parsers
•Automatic and Interactive Repair Approaches
•The SOUP Parser
•Philosophy, Parsing, Formalism, Characteristics,
Performance, Heuristics