COSC343: Artificial Intelligence

scarfpocketΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

146 εμφανίσεις

COSC343:Artificial Intelligence
Lecture 25:Statistical Natural Language Processing
Alistair Knott
Dept.of Computer Science,University of Otago
Alistair Knott (Otago)
COSC343 Lecture 25
1/26
In today’s lecture
Ambiguity in natural language
Simple probability models for natural language
Parsing and ambiguity
Wide-coverage grammars and the problem of proliferating
ambiguity
Probabilistic grammars,and how to learn them
Alistair Knott (Otago)
COSC343 Lecture 25
2/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Ambiguity in natural language
Squad Helps Dog Bite Victim
British Left Waffles On Falklands
A list of teachers broken down by age and sex is posted in the lobby
Prostitutes Appeal to Pope
New Vaccine May Contain Rabies
Prosecutor Releases Probe into Undersheriff
Caribbean Islands Drift to Left
A man is run over in New York every hour
Alistair Knott (Otago)
COSC343 Lecture 25
3/26
Techniques for resolving ambiguity
1.Use semantics.
Work out the meaning of each interpretation.
Which interpretation makes most sense?
2.Use statistics.Which interpretation is most likely?
Each interpretation involves various constructions.
Gather a corpus of actual uses of constructions in language.
Use this to estimate the probability of each construction.
Using this model,which interpretation of the sense has highest
probability?
Alistair Knott (Otago)
COSC343 Lecture 25
4/26
Probabilistic models of natural language
Probabilistic models can be built for many domains.One domain with
useful applications is natural language processing.
Probabilities are estimated by counting occurrences of events in a
training corpus of online text.
Probabilties are used to estimate the likelihood of events in new
(unseen) texts.
One example is a spam filter.(Uses Bayesian probabilities.)
In this lecture,we’ll look at some other other applications of probability.
Alistair Knott (Otago)
COSC343 Lecture 25
5/26
Probabilistic language models
A probabilistic language model assigns probabilities to word
sequences.
Lubica already showed you a simple model which assumes that
the probability of each word in a sequence depends only on the n
previous words.
(Also called a Markovian or n-gram model.)
To build an n-gram model,we count frequencies of word sequences in
a training corpus.
If n = 0,we count the frequency of each individual word,and
estimate the prior probability distribution for words.
If n = 1,we count the frequency of each possible sequence of two
words.Then if we’re given a word,we can estimate the conditional
probability distribution for the next word.
Alistair Knott (Otago)
COSC343 Lecture 25
6/26
Some n-grammodels
Here’s a sequence generated by a model trained on unigrams in
AIMA:
logical are as are confusion a may right tries agent goal the was
diesel more object then information-gathering search is
Here’s one generated by a model trained on bigrams:
planning purely diagnostic expert systems are very similar
computational approach would be represented compactly using
tic-tac-toe a predicate
Here’s one generated by a model trained on trigrams:
planning and scheduling are integrated the success of naive
bayes model is just a possible prior source by that time
Alistair Knott (Otago)
COSC343 Lecture 25
7/26
Problems building a linear language model
Clearly,a higher n gives a better model.
But there’s a sparse data problem:it’s hard to get good estimates for
sequences of length n when n gets large.
The problem is particularly severe because words are distributed
according to Zipf’s law:
Basically,there are lots of very rare words.
So to get good estimates of all word sequences,you need a huge
corpus.
Alistair Knott (Otago)
COSC343 Lecture 25
8/26
Uses of a linear language model
We can use a linear language model in speech interpretation:
Say the speaker’s audio signal is consistent with several
alternative word sequences (e.g.recognise speech/wreck a nice
beach)
The probability model can be used to choose between these
alternatives.(Which sequence is most probable?)
Alistair Knott (Otago)
COSC343 Lecture 25
9/26
Uses of a linear language model
We can also use a linear model for word sense disambiguation.
Many words are semantically ambiguous.
E.g.bank can mean ‘river edge’ or ‘financial institution’.)
To disambiguate:
Identify the sense of each ambiguous word by hand in the training
corpus.
Then build an nary probability model.
Now the context of an ambiguous word carries information about
which sense was intended.
P([swam,to,the,RIVERBANK]) = 0:000001
P([swam,to,the,FINANCIALBANK]) = 0:000000000001
Alistair Knott (Otago)
COSC343 Lecture 25
10/26
The limits of a linear language model
We’ve seen that in order to describe the range of sentences in a
human language,we need to build a grammar,which assigns each
sentence a hierarchical structure.
On this model,sentences are not simple sequences—they’re
trees.
Our model of semantic interpretation uses these trees to build
sentence meanings.
What we need to do is to build a probability model which works with a
grammar.
One useful approach is to build a model that lets us disambiguate
if there are multiple possible parse trees.
Alistair Knott (Otago)
COSC343 Lecture 25
11/26
Parsing and ambiguity
Consider the following context-free grammar:
S!NP,VP PN!John
NP!Det,N N!man,girl,telescope
NP!PN VT!saw
PP!P,NP Det!the
VP!VT,NP P!with
NP!NP,PP
VP!VP,PP
Q:How will this grammar parse the following sentence?
The man saw the girl with the telescope.
Alistair Knott (Otago)
COSC343 Lecture 25
12/26
Parsing and ambiguity
Note:a parser will find all possible syntactic analyses of a sentence.
Alistair Knott (Otago)
COSC343 Lecture 25
13/26
Wide-coverage grammars
Until about 15 years ago,computational linguists focussed on building
small grammars,with small lexicons.
But if we want to build useful natural language processing systems,we
need grammars which deal with a huge range of sentence structures
and words.
Thursday:started reading for essay.Had a wee nap to enrich
concentration.
Fr.Clovis highlighted what he sees as a crisis arising from ‘a new
paganism’ of excess and decadence comparable to the vomitoria
of ancient Rome.
A nightclub for young people,featuring ‘mocktails’ and live music,
is being considered by the Dunedin Youth Forum.
Alistair Knott (Otago)
COSC343 Lecture 25
14/26
Wide-coverage grammars
Until about 15 years ago,computational linguists focussed on building
small grammars,with small lexicons.
But if we want to build useful natural language processing systems,we
need grammars which deal with a huge range of sentence structures
and words.
Thursday:started reading for essay.Had a wee nap to enrich
concentration.
Fr.Clovis highlighted what he sees as a crisis arising from ‘a new
paganism’ of excess and decadence comparable to the vomitoria
of ancient Rome.
A nightclub for young people,featuring ‘mocktails’ and live music,
is being considered by the Dunedin Youth Forum.
Alistair Knott (Otago)
COSC343 Lecture 25
14/26
Problems in building ‘real’ grammars
We certainly need include rules to deal with the following
constructions:
Here’s the equation for a Laplace transform.
Mustang Sally walked into the bar.
Who volunteers?Me.
Alistair Knott (Otago)
COSC343 Lecture 25
15/26
An example of overgeneration
How will the resulting grammar deal with the sentence John saw
Mary?
Point:A wide-coverage grammar gives us spurious ambiguities.
Alistair Knott (Otago)
COSC343 Lecture 25
16/26
Towards a solution:probabilistic grammars
How can we disambiguate in these cases?
A logic-based approach:find the meaning of each interpretation,
and choose the one which fits best with world knowledge.
But this sounds like an ‘AI-complete’ task.
A statistical approach:look at how frequently particular rules are
actually used.
Laplace transformand Mustang Sally are relatively rare
constructions.
The correct syntactic structure of John saw Bill occurs very
frequently.
Alistair Knott (Otago)
COSC343 Lecture 25
17/26
Assembling a training corpus
The first thing to do is to gather a (largeish) set of sentences which are
representative of the kind of sentences you want to parse.
The method is then as follows:
Parse each sentence by hand,to say what the correct analysis is
for each.
Count the number of occurrences of each rule application.
Use these counts to build a probabilistic grammar,where every
rule is associated with a probability.
When you parse a sentence with a probabilistic grammar,
disambiguation is easy:you just choose the most likely parse.
Alistair Knott (Otago)
COSC343 Lecture 25
18/26
An example training corpus
Alistair Knott (Otago)
COSC343 Lecture 25
19/26
Counting rules in the corpus
Rule
Frequency in corpus
S!NP,VP
3
S!NP
1
NP!PN
6
NP!Det,N
1
VP!VT,NP
3
N!PN,N
1
N!mustang
1
N!saw
1
VT!saw
1
det!a
1
PN!Mary
2
PN!John
1
PN!N,PN
1
Total:23 rule applications
Alistair Knott (Otago)
COSC343 Lecture 25
20/26
Computing the probability of a parse tree
Let’s think of a tree as a set of rule applications R
1
:::R
n
.
Each rule application R
i
has a parent node and one or more child
nodes.
First idea:‘since we’re using a context-free grammar,the probabilities
of each rule application are independent.So we can multiply together
p(R
i
) for each rule application in the tree to get the probabiltity of the
whole tree.’
But this isn’t quite right:the children of the top rule application tell
us what the parents of lower rule applications are.
Alistair Knott (Otago)
COSC343 Lecture 25
21/26
Conditional probabilities in a parse tree
We need to work with the conditional probabilities of rule applications
given knowledge of the parent node.
Because we’re working with a context-free grammar,these
probabilities really are independent.
So to get the probability of the whole tree,we first compute
p(RjParent) for each rule application R,and then multiply all
these probablities.
We then multiply the result by the prior probability that the root
node N
1
is a root node.
Alistair Knott (Otago)
COSC343 Lecture 25
22/26
Estimating probabilities fromthe corpus
Rule
Frequency in corpus
Estimated p(RulejParent)
S!NP,VP
3
3/4 =.75
S!NP
1
1/4 =.25
NP!PN
6
6/7 =.86
NP!Det,N
1
1/7 =.14
VP!VT,NP
3
1
N!PN,N
1
1/3 =.33
N!mustang
1
1/3 =.33
N!saw
1
1/3 =.33
VT!saw
1
1
det!a
1
1
PN!Mary
2
2/4 =.5
PN!John
1
1/4 =.25
PN!N,PN
1
1/4 =.25
Est’d prior prob of a node being a root node:1 if it’s S;0 otherwise.
Alistair Knott (Otago)
COSC343 Lecture 25
23/26
Using rule probabilities to disambiguate
Left-hand interpretation:
p(tree) = 1 :75 :86 :25 1 1 :86 :5 =:0693
Right-hand interpretation:
p(tree) = 1 :25 :86 :25 :33 :25 :33 :5 =:0007
Alistair Knott (Otago)
COSC343 Lecture 25
24/26
Top-down and bottom-up probability models
When we defined the probability of a tree in terms of conditional
probabilities of rule applications,we used a top-down probability
model,involving:
p(Rule_ApplicationjParent);
the prior probability of the root node in the tree being a root node.
Alternatively,we could use a bottom-up model,involving:
p(Rule_ApplicationjChild
1
^:::Child
n
);
the prior probability of the words in the tree appearing in a
sentence.
Alistair Knott (Otago)
COSC343 Lecture 25
25/26
Summary and reading
You’ve seen:
Something about how language relates to action and goals.
(Linking to the ‘agent-focused’ part of the course.)
Something about natural language syntax.
(Linking to the syntax of the predicate calculus.)
Something about parsing.
(Linking to the ‘search’ part of the course.)
Something about probabilistic language models.
(Linking to the ‘probabilistic reasoning’ part of the course.)
Reading for this lecture:AIMA Chapter 22 Section 1,Chapter 23
Section 2.1.
Alistair Knott (Otago)
COSC343 Lecture 25
26/26