LIN 6932
1
LIN6932 Topics in
Computational Linguistics
Lecture 7
Hana Filip
LIN 6932
2
Overview
Admin Stuff: All have access to their lin6932
class account?
•
Historical background
•
Definition of formal grammars
•
Chomsky Hierarchy
•
Computational complexity of natural
language
LIN 6932
3
Historical Background
•
In
1949
,
Shannon
and
Weaver
published
The
Mathematical
Theory
of
Communication
,
showing
that
statistical
approximations
to
English
based
on
Markov
processes
could
be
used
to
encode
English
efficiently
for
transmission
in
noise
.
Tasks
like
machine
translation
could
be
solved
by
treating
e
.
g
.
Russian
as
a
noisy
encoding
of
English
.
•
In
1957
,
Chomsky
published
Syntactic
Structures
,
showing
that
natural
grammars
could
not
be
exactly
captured
by
such
methods
.
It
seemed
to
follow
that
machine
translation
could
not
be
modeled
as
a
noisy
channel
(although
the
machines
were
too
small
to
actually
try
any
of
this)
.
•
Chomsky
made
a
point
of
being
open
to
the
idea
that
STATISTICS
could
guide
grammatical
processing
.
•
Nevertheless,
most
work
in
computational
linguistics
switched
to
linguistically
informed
high

level
SYMBOLIC
representations
of
syntax
and
semantics,
and
small
knowledge
domains
.
LIN 6932
4
Historical Background
Given
the
grammar
of
a
language,
one
can
study
the
use
of
the
language
statistically
in
various
ways
;
and
the
development
of
probabilistic
models
for
the
use
of
language
(as
distinct
from
the
syntactic
structure
of
the
language)
can
be
quite
rewarding
.
Chomsky,
1957
:
17
,
note
4
LIN 6932
5
Historical Background
•
Around 1988, the machines got big enough to try both techniques.
•
Surprisingly, low level statistical approximations such as Markov processes
worked better than linguistically informed representations on almost all tasks,
such as speech recognition, parsing free text, and MT itself.
•
Most work in computational linguistics switched to linguistically uninformed,
low

level statistical approximations and machine learning.
•
Around 2000, the process of putting linguistics, statistical models, and
machine learning, back together again.
LIN 6932
6
Historical Background
•
Chomsky’s
1957
book
Syntactic
Structures
,
together
with
certain
more
technical papers from around the same time, is one of the most important
documents in linguistics and cognitive science.
•
The theory in detail has been completely superseded.
•
Nevertheless, surprisingly many of the formal devices that it includes recur,
particularly in the most modern descendents, including: kernel sentences (aka
lexicalized elementary trees or categories etc.); generalized or
“double

based” transformations (aka Merge, combinatory rules,
tree

adjunction, etc.; affix

hopping); the role of statistics
in natural language processing.
LIN 6932
7
Historical Background
•
Affix Hopping
as PF Merger and VP Ellipsis*
LIN 6932
8
Historical Background
One
crucial
ingredient
of
the
theory
was
a
hierarchy
of
language
types,
now
known
as
CHOMSKY’S
HIERARCHY,
each
type
characterized
by
a
class
of
rules
that
are
sufficient
to
specify
all
languages
of
that
type,
an
automaton
which
is
sufficient
to
recognize
whether
a
sentence
is
from
a
given
language
of
that
type,
and
a
class
of
languages
including
all
those
of
classes
lower
in
the
hierarchy
as
a
proper
subset
.
Question
:
What
is
the
expressive/generative
power
of
natural
languages?
Where
are
human
languages
located
on
this
hierarchy?
LIN 6932
9
Formal Grammar (Review)
A formal grammar
G
is quadruple
G = <N, T, S, R>
N a finite set of nonterminal symbols
T a finite set terminal symbols
S start symbol (
S
N
)
R rules (‘productions’)
–
Rules take the form
,
“
rewrites as
”, where
,
are strings
of symbols from the
infinite set of strings (
T
N
)* and
must contain at least one
non

terminal symbol.
Other conditions on the rules are imposed for particular classes of grammars.
LIN 6932
10
Chomsky Hierarchy
Other conditions are imposed for particular classes of grammars.
The Chomsky hierarchy:
Types of grammars defined in terms of additional restrictions on the form of the
rules:
Type 0:
No restriction. Each rule is of the form
, and
≠
.
Type 1
: Each rule is of the form
A
, where
A
N
and
≠
.
Type 2
: Each rule is of the form
A
.
(
may be
.
)
Type 3
: Each rule is of the form
A
aB
or
A
a.
Common names:
Type 0:
Unrestricted rewriting systems (Turing Equivalent)
Type 1:
Context

sensitive grammars.
Type 2:
Context

free grammars.
Type 3:
Right

linear, or regular, or finite state automata / finite state grammars.
LIN 6932
11
Chomsky Hierarchy
LIN 6932
12
Chomsky Hierarchy
of classes (or families) of languages, originally defined by the
form of the rules
needed to generate the languages in those classes, but which can also be
characterized at least in part by
"dependencies" between elements that appear
in the strings
that make up the languages of those classes.
The
smallest
infinite
class
of
languages
in
the
Chomsky
hierarchy
is
the
class
RL
of
regular
languages
.
These
are
the
languages
that
can
be
represented
by
regular
expressions
.
The next larger class of languages in the Chomsky hierarchy is the class
CFL
of
context

free languages
. Every regular language is also a context

free language,
but the converse is not true.
LIN 6932
13
Chomsky Chierarchy
TYPE O: UNRESTRICTED REWRITING SYSTEMS
no restrictions on the rules: L(G) = {w
T*  S
* w}
a set of strings composed of terminal symbols derived from S
“
*” is the reflexive and transitive closure of “
”
“
*”
can be constructed as a set of operators each of which is
obtained by applying
successively 0 or more times.
•
Every
recursively
enumerable
language
can
be
described
by
a
rewriting
system
:
an
algorithm
that
"enumerates"
the
strings
of
the
language,
I
.
e
.
,
this
means
that
its
output
is
simply
a
list
of
the
members
of
L
:
w
1
,
w
2
,
w
3
,
...
.
If
necessary,
this
algorithm
may
run
forever
.
•
Recursively
enumerable
languages
are
languages
for
which
there
is
a
decision
procedure
for
determining
for
any
arbitrary
string
that
it
is
a
well

defined
string
in
the
language,
but
not
necessarily
for
determining
for
any
arbitrary
string
not
in
the
language
that
it
is
not
.
•
Membership
in
a
type

0
language
is
undecidable
.
LIN 6932
14
Chomsky Chierarchy
CONTEXT

SENSITIVE GRAMMARS
–
Subclass of type

0 grammars
–
Restriction: all rules take the form
A
length(
)
length(
)
where
A
N
,
,
,
(
T
N
)*,
≠
Membership in a context

sensitive language (CSL) is decidable.
CLSs are languages for which there is a decision procedure for
determining whether an arbitrary string does or does not belong
to the language.
LIN 6932
15
Chomsky Chierarchy
•
Not all decidable languages are context

sensitive (but most are)
LIN 6932
16
Chomsky Chierarchy
CONTEXT

FREE GRAMMARS
–
Subclass of context

sensitive grammars
–
Restriction: rules take the form
A
where
A
N
,
(
T
N
)
+

Membership in context

free language (CFL) is
decidable
LIN 6932
17
Chomsky Chierarchy
REGULAR GRAMMARS
–
Subclass of context

free grammars
–
Restriction: all rules take the form
A
a
or
A
aB
where
A,B
N
and
a
T
Membership is decidable.
RGs are expressively equivalent to finite state automata, or
Markov process.
Automata viewed as either generators or acceptors.
Grammars viewed as either generators or acceptors.
LIN 6932
18
Chomsky Chierarchy
REGULAR GRAMMARS
A
a
or
A
aB
where
A,B
N
and
a
T
LIN 6932
19
Chomsky Chierarchy
Strong and Weak Equivalence
•
Two grammars are said to be “weakly equivalent” if they generate the same
language or string set: cp. categorial grammars and phrase structure grammars.
•
Two grammars are “strongly equivalent” if they assign the same tree(s) to
each string in the same language.
•
All grammars at a given level in the hierarchy have strongly equivalent
grammars at higher levels, but not vice versa.
•
A grammar or class of grammars is said to be “strongly adequate” to the
capture of a language or class of languages if it assigns the “right” trees to
strings. The “right” tree is the one we need for semantic interpretation.
•
Weakly equivalent grammars which assign the “wrong” trees are said to be
only “weakly adequate.”
LIN 6932
20
Chomsky Chierarchy
Categorial Grammar
Phrase Structure Grammar
John loves Mary
S
n <n, <n,t>> n
NP
VP
n
<n,t>
N
V NP
t (semantic type of a sentence)
John loves Mary
combinatorially transparent categories
LIN 6932
21
NL and Chomsky Hierarchy
Where are natural languages located?
Expressive power of human languages
•
The first three chapters of Chomsky 1957 show that human languages fall
outside the lowest level of
Regular/Finite State
languages, and are
at least
at the level
of
Context Free
Languages.
•
The proof requires a distinction between ideal linguistic capacity, now known
as
Competence
and the
Performance
mechanism that actually processes
sentences.
•
Competence allows sentences that are so long or convoluted that none of us
will live long enough or have enough memory to process them. Performance
cannot cope with them
—
but clearly this limitation is accidental, not a fact
about English.
•
Syntactic Structures
goes on to suggest that the level of human grammars is
still higher in the hierarchy. It raises (but does not answer) the question of
which level is just high enough to contain all human languages.
LIN 6932
22
NL and Chomsky Hierarchy
Where are natural languages located?
Typical argument for the complexity of NL:
–
Find a recursive construction
C
in a natural language
L
(English)
–
Assume that the construction type in question is theoretically unbounded: i.e., in
theory, speakers could go on producing ever longer instances of the construction.
–
Argue that the competence of speakers admits unlimited recursion (while the
performance certainly poses an upper limit; competence vs. performance
distinction!)
–
Reduce
C
to a formal expression of known complexity in language
L’
via a
homomorphism (a structure

preserving mapping)
–
Make a case that
L
must be at least as complex
L’
–
Extrapolate from this one instance to all human languages: if there’s this one
construction
C
in this one language that has this complexity, then the human
language faculty must allow this in general.
LIN 6932
23
NL and Chomsky Hierarchy
NL is not regular: Chomsky’s 1957 original argument
Structure of his argument:
Consider 3 hypothetical languages:
1.
ab, aabb, aaabbb
(a
n
b
n
)
2.
aa, bb, abba, baab, aaaa, bbbb, aabbaa, abbbba, …
(palindromic)
3.
aa, bb, abab, baba, aaaa, bbbb, aabaab, abbabb, aababaabab
(copy
language)
•
It can be shown that these are not regular languages
LIN 6932
24
NL and Chomsky Hierarchy
The Pumping Lemma
(a technique for proving that certain
languages are
not
regular).
If
L
is an infinite finite automaton language (FAL) over
alphabet
A
, then there are strings
x,y,z
A
* such that
y
≠
and
xy
n
z
L
for all
n
≥ 0.
Why: machines for infinite languages must have loops. The
string
y
in the lemma corresponds to a string accepted
during a traversal of a loop.
Note that lemma does not say ‘iff’.
LIN 6932
25
NL and Chomsky Hierarchy
The Pumping Lemma
(a technique for proving that certain languages
are
not
regular).
Example
:
L
= {
a
n
b
n

n
≥ 0 }
If
L
were
a
FAL,
then,
by
Pumping
Lemma,
there
would
be
x,y,z
A
*
such
that
y
≠
and
xy
n
z
L
for
all
n
≥
0
.
Assume
that
such
x,y,z
exist
so
the
string
xyz
is
in
L
.
But
by
definition
of
L
it
should
be
in
the
form
a
n
b
n
for
some
n
.
What’s
y
in
this
case?
It
can’t
be
empty,
so
it
would
consist
of
(
1
)
some
number
of
a’
s,
or
(
2
)
some
number
of
b’
s,
or
(
3
)
some
number
of
a’
s
followed
by
some
number
of
b’
s
.
But
it
is
easy
to
see
that
in
any
of
those
cases
the
strings
xyyz,
xyyyz
,
etc
.
could
not
belong
to
L
.
But
the
pumping
theorem
is
not
always
useful
for
showing
a
language
to
be
non

regular
.
LIN 6932
26
NL and Chomsky Hierarchy
NL is not regular

at least context

free power: Chomsky’s 1957 original argument:
LIN 6932
27
NL and Chomsky Hierarchy
Therefore,
Chomsky
claims,
English
cannot
be
regular
“It
is
clear,
then
that
in
English
we
can
find
a
sequence
a
+
S
1
+
b
,
where
there
is
a
dependency
between
a
and
b
,
and
we
can
select
as
S
1
another
sequence
c
+
S
2
+
d
,
where
there
is
a
dependency
between
c
and
d
…
etc
.
A
set
of
sentences
that
is
constructed
in
this
way
…
will
have
all
of
the
mirror
image
properties
of
(
2
)
which
exclude
(
2
)
from
the
set
of
finite
languages
.
”
(Chomsky
1957
,
p
.
21
)
Note
:
Chomsky
writes
“finite
languages”,
but
he
means
“regular
languages”
.
LIN 6932
28
NL and Chomsky Hierarchy
Chomsky’s argument: because English contains these
constructions, which are not regular, English is not regular.
As stated, the argument is fallacious.
LIN 6932
29
NL and Chomsky Hierarchy
How to state the observation correctly
LIN 6932
30
NL and Chomsky Hierarchy
Similar point about
center

embedding/nested dependencies
the cats
that
the dog
chases miau
the dependency between
the dog
and
chases
nests within the dependency
between
the cats
and
miau
Assume the following homomorphism:
a = {the cat, the dog, the rat, …}
b = {chase, miau, bite, bark, …}
Then this is an instance of
a
n
b
n
Chomsky’s argument:
•
Any useful syntactic analysis will relate the nouns to their corresponding verbs.
•
No FSA is capable of keeping track of center embeddings of arbitrary depth
(which would be required since the grammatical subset of L is infinite). No FSA
can provide a
useful syntactic analysis
for center

embedding.
Therefore, since English has such constructions, English is non regular language.
LIN 6932
31
NL and Chomsky Hierarchy
LIN 6932
32
NL and Chomsky Hierarchy
The language
a
n
b
n
corresponds to the context

free grammar
S
a S b
S
a b
It gives rise to the following tree for the string
aaabbb
LIN 6932
33
NL and Chomsky Hierarchy
Dissenting view 1:
•
all arguments to this effect use
center

embedding/nested
dependencies
,
cp.
the cats
that
the dog chases
miau
the dependency between
the dog
and
chases
nests within the
dependency between
the cats
and
miau
•
humans are extremely bad at processing center

embedding
•
notion of
competence
that ignores this is dubious
•
natural languages are regular after all
LIN 6932
34
NL and Chomsky Hierarchy
Dissenting view 2:
•
Any
*finite*
language
is
a
regular
language
.
•
If
you
don't
distinguish
performance
and
competence,
then
English
as
a
language
certainly
couldn't
contain
any
sentence
longer
than
the
number
of
words
a
human
being
could
utter
in
a
lifetime
.
(This
assumes
human
lifetimes
are
finite,
but
that
seems
uncontroversial
.
)
•
This
may
be
a
HUGE
number,
but
it
is
definitely
finite,
and
so
without
the
distinction
English
is
formally
a
finite
language
and
therefore
regular
.
LIN 6932
35
NL and Chomsky Hierarchy
Are natural languages context

free?
•
history of the question: Chomsky’s 1957 conjecture that natural
languages are not context

free
•
In the 60’s and 70’s, many attempts to prove that NL is not context

free
•
Pullum and Gazdar 1982 (Generalized Phrase Structure Grammar):

all these attempts have failed

for all we know, natural languages (conceived as string sets) might be
context

free
•
Huybregts 1984, Shieber 1985:

proof that Swiss German is not context

free, cross

serial dependencies
•
Culy 1985: proof that Bambara (a Northwestern Mande language
spoken in Mali) ) is not context

free
LIN 6932
36
NL and Chomsky Hierarchy
•
Nested and Crossing Dependencies
Context

free languages

unlike regular languages

can have unbounded
dependencies.
However, these dependencies can only be nested, not crossing.
Example:
a
n
b
n
has unlimited nested dependencies: context

free
The copy language has unlimited crossing dependencies: not context

free
LIN 6932
37
NL and Chomsky Hierarchy
•
Nested and Crossing Dependencies
Bar

Hillel and Shamir (1960):
–
English contains copy

language (crossing dependencies)
–
Cannot be context

free
John
,
Mary
,
David
, ... are
a widower
,
a widow
,
a widower
, ..., respectively.
Claim: the sentence is only grammatical under the condition that if the
n
th name is
male (female) then the
n
th phrase after the copula is
a widower (a widow)
LIN 6932
38
NL and Chomsky Hierarchy
•
Nested and Crossing Dependencies
suppose the claim is true, intersect English with regular language
L
1
=(
Paul

Paula
)
+
are
(
a widower

a widow
)
+
respectively
Result: Copy language L
3
{
ww

w
(
a

b
)
+
}
English
L
1
= L
2
, homomorphism L
2
L
3
John, David, Paul, …
a
a widower
a
Mary, Paula, Betty, …
b
a widow
b
are, respectively
LIN 6932
39
NL and Chomsky Hierarchy
•
Result: Copy language L
3
{
ww

w
(
a

b
)
+
}
Copy language is not context

free
Hence L
2
is not
Hence English is not
LIN 6932
40
NL and Chomsky Hierarchy
Cross

serial dependencies in Dutch
Huybregt (1976)
–
Dutch has copy

language like structures
–
thus Dutch is not context

free
dat Jan
Marie
Pieter
Arabisch
laat
zien
schrijven
that jan marie pieter arabic let see write
‘that Jan let Marie see Pieter write Arabic’
LIN 6932
41
NL and Chomsky Hierarchy
Counterargument
Crossing dependencies only concern argument linking,
i.e., semantics
As far as plain strings are concerned, the relevant
fragment of Dutch has the structure
NP
n
V
n
which is context

free
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment