Lec 7 (ppt) - plaza

scarfpocketΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 4 χρόνια και 15 μέρες)

56 εμφανίσεις

LIN 6932

1

LIN6932 Topics in
Computational Linguistics

Lecture 7

Hana Filip

LIN 6932

2

Overview

Admin Stuff: All have access to their lin6932
class account?



Historical background


Definition of formal grammars


Chomsky Hierarchy


Computational complexity of natural
language

LIN 6932

3

Historical Background


In

1949
,

Shannon

and

Weaver

published

The

Mathematical

Theory

of


Communication
,

showing

that

statistical

approximations

to

English

based

on


Markov

processes

could

be

used

to

encode

English

efficiently

for


transmission

in

noise
.

Tasks

like

machine

translation

could

be

solved

by


treating

e
.
g
.

Russian

as

a

noisy

encoding

of

English
.


In

1957
,

Chomsky

published

Syntactic

Structures
,

showing

that

natural


grammars

could

not

be

exactly

captured

by

such

methods
.

It

seemed

to

follow


that

machine

translation

could

not

be

modeled

as

a

noisy

channel

(although


the

machines

were

too

small

to

actually

try

any

of

this)
.


Chomsky

made

a

point

of

being

open

to

the

idea

that

STATISTICS

could

guide


grammatical

processing
.



Nevertheless,

most

work

in

computational

linguistics

switched

to


linguistically

informed

high
-
level

SYMBOLIC

representations

of

syntax

and

semantics,


and

small

knowledge

domains
.


LIN 6932

4

Historical Background

Given

the

grammar

of

a

language,

one

can

study

the

use

of

the

language

statistically

in

various

ways
;

and

the

development

of

probabilistic

models

for

the

use

of

language

(as

distinct

from

the

syntactic

structure

of

the

language)

can

be

quite

rewarding
.

Chomsky,

1957
:
17
,

note

4



LIN 6932

5

Historical Background




Around 1988, the machines got big enough to try both techniques.



Surprisingly, low level statistical approximations such as Markov processes
worked better than linguistically informed representations on almost all tasks,
such as speech recognition, parsing free text, and MT itself.



Most work in computational linguistics switched to linguistically uninformed,


low
-
level statistical approximations and machine learning.



Around 2000, the process of putting linguistics, statistical models, and


machine learning, back together again.




LIN 6932

6

Historical Background



Chomsky’s

1957

book

Syntactic

Structures
,

together

with

certain

more

technical papers from around the same time, is one of the most important

documents in linguistics and cognitive science.



The theory in detail has been completely superseded.



Nevertheless, surprisingly many of the formal devices that it includes recur,

particularly in the most modern descendents, including: kernel sentences (aka

lexicalized elementary trees or categories etc.); generalized or

“double
-
based” transformations (aka Merge, combinatory rules,

tree
-
adjunction, etc.; affix
-
hopping); the role of statistics

in natural language processing.






LIN 6932

7

Historical Background







Affix Hopping

as PF Merger and VP Ellipsis*

LIN 6932

8

Historical Background

One

crucial

ingredient

of

the

theory

was

a

hierarchy

of

language

types,

now

known

as

CHOMSKY’S

HIERARCHY,

each

type

characterized

by

a

class

of

rules

that

are

sufficient

to

specify

all

languages

of

that

type,

an

automaton

which

is

sufficient

to

recognize

whether

a

sentence

is

from

a

given

language

of

that

type,

and

a

class

of

languages

including

all

those

of

classes

lower

in

the

hierarchy

as

a

proper

subset
.


Question
:

What

is

the

expressive/generative

power

of

natural

languages?

Where

are

human

languages

located

on

this

hierarchy?






LIN 6932

9

Formal Grammar (Review)

A formal grammar

G
is quadruple

G = <N, T, S, R>


N a finite set of nonterminal symbols


T a finite set terminal symbols


S start symbol (
S


N
)


R rules (‘productions’)



Rules take the form

,




rewrites as

”, where

,


are strings
of symbols from the
infinite set of strings (
T

N
)* and


must contain at least one
non
-
terminal symbol.

Other conditions on the rules are imposed for particular classes of grammars.


LIN 6932

10

Chomsky Hierarchy

Other conditions are imposed for particular classes of grammars.

The Chomsky hierarchy:

Types of grammars defined in terms of additional restrictions on the form of the
rules:

Type 0:
No restriction. Each rule is of the form





, and




.

Type 1
: Each rule is of the form

A





, where
A


N
and



.

Type 2
: Each rule is of the form
A



.
(


may be

.
)

Type 3
: Each rule is of the form
A


aB
or
A


a.

Common names:

Type 0:
Unrestricted rewriting systems (Turing Equivalent)

Type 1:
Context
-
sensitive grammars.

Type 2:
Context
-
free grammars.

Type 3:
Right
-
linear, or regular, or finite state automata / finite state grammars.

LIN 6932

11

Chomsky Hierarchy




LIN 6932

12

Chomsky Hierarchy

of classes (or families) of languages, originally defined by the
form of the rules

needed to generate the languages in those classes, but which can also be
characterized at least in part by
"dependencies" between elements that appear
in the strings

that make up the languages of those classes.


The

smallest

infinite

class

of

languages

in

the

Chomsky

hierarchy

is

the

class

RL

of

regular

languages
.

These

are

the

languages

that

can

be

represented

by

regular

expressions
.


The next larger class of languages in the Chomsky hierarchy is the class
CFL
of
context
-
free languages
. Every regular language is also a context
-
free language,
but the converse is not true.


LIN 6932

13

Chomsky Chierarchy

TYPE O: UNRESTRICTED REWRITING SYSTEMS

no restrictions on the rules: L(G) = {w


T* | S

* w}



a set of strings composed of terminal symbols derived from S





*” is the reflexive and transitive closure of “







*”
can be constructed as a set of operators each of which is




obtained by applying


successively 0 or more times.



Every

recursively

enumerable

language

can

be

described

by

a

rewriting

system
:

an

algorithm

that

"enumerates"

the

strings

of

the

language,

I
.
e
.
,

this

means

that

its

output

is

simply

a

list

of

the

members

of

L
:

w
1
,

w
2
,

w
3
,

...

.

If

necessary,

this

algorithm

may

run

forever
.


Recursively

enumerable

languages

are

languages

for

which

there

is

a

decision

procedure

for

determining

for

any

arbitrary

string

that

it

is

a

well
-
defined

string

in

the

language,

but

not

necessarily

for

determining

for

any

arbitrary

string

not

in

the

language

that

it

is

not
.



Membership

in

a

type
-
0

language

is

undecidable
.


LIN 6932

14

Chomsky Chierarchy

CONTEXT
-
SENSITIVE GRAMMARS




Subclass of type
-
0 grammars



Restriction: all rules take the form







A







length(

)


length(

)

where
A

N
,

,

,




(
T

N
)*,






Membership in a context
-
sensitive language (CSL) is decidable.

CLSs are languages for which there is a decision procedure for
determining whether an arbitrary string does or does not belong
to the language.

LIN 6932

15

Chomsky Chierarchy


Not all decidable languages are context
-
sensitive (but most are)


LIN 6932

16

Chomsky Chierarchy

CONTEXT
-
FREE GRAMMARS



Subclass of context
-
sensitive grammars



Restriction: rules take the form


A





where
A

N
,




(
T

N
)
+

-
Membership in context
-
free language (CFL) is
decidable


LIN 6932

17

Chomsky Chierarchy

REGULAR GRAMMARS



Subclass of context
-
free grammars



Restriction: all rules take the form




A


a


or




A


aB




where
A,B


N

and
a


T

Membership is decidable.

RGs are expressively equivalent to finite state automata, or
Markov process.


Automata viewed as either generators or acceptors.


Grammars viewed as either generators or acceptors.


LIN 6932

18

Chomsky Chierarchy

REGULAR GRAMMARS



A


a
or

A


aB

where
A,B


N

and
a


T

LIN 6932

19

Chomsky Chierarchy

Strong and Weak Equivalence



Two grammars are said to be “weakly equivalent” if they generate the same

language or string set: cp. categorial grammars and phrase structure grammars.



Two grammars are “strongly equivalent” if they assign the same tree(s) to

each string in the same language.



All grammars at a given level in the hierarchy have strongly equivalent

grammars at higher levels, but not vice versa.



A grammar or class of grammars is said to be “strongly adequate” to the

capture of a language or class of languages if it assigns the “right” trees to

strings. The “right” tree is the one we need for semantic interpretation.



Weakly equivalent grammars which assign the “wrong” trees are said to be

only “weakly adequate.”



LIN 6932

20

Chomsky Chierarchy

Categorial Grammar






Phrase Structure Grammar


John loves Mary




S

n <n, <n,t>> n




NP


VP

n

<n,t>





N


V NP

t (semantic type of a sentence)



John loves Mary


combinatorially transparent categories




LIN 6932

21

NL and Chomsky Hierarchy

Where are natural languages located?

Expressive power of human languages


The first three chapters of Chomsky 1957 show that human languages fall

outside the lowest level of
Regular/Finite State
languages, and are
at least
at the level

of
Context Free
Languages.



The proof requires a distinction between ideal linguistic capacity, now known

as
Competence
and the
Performance
mechanism that actually processes

sentences.



Competence allows sentences that are so long or convoluted that none of us

will live long enough or have enough memory to process them. Performance

cannot cope with them

but clearly this limitation is accidental, not a fact

about English.



Syntactic Structures
goes on to suggest that the level of human grammars is

still higher in the hierarchy. It raises (but does not answer) the question of

which level is just high enough to contain all human languages.


LIN 6932

22

NL and Chomsky Hierarchy

Where are natural languages located?

Typical argument for the complexity of NL:


Find a recursive construction
C

in a natural language
L
(English)


Assume that the construction type in question is theoretically unbounded: i.e., in
theory, speakers could go on producing ever longer instances of the construction.


Argue that the competence of speakers admits unlimited recursion (while the
performance certainly poses an upper limit; competence vs. performance
distinction!)


Reduce
C

to a formal expression of known complexity in language
L’

via a
homomorphism (a structure
-
preserving mapping)


Make a case that
L

must be at least as complex
L’


Extrapolate from this one instance to all human languages: if there’s this one
construction
C
in this one language that has this complexity, then the human
language faculty must allow this in general.

LIN 6932

23

NL and Chomsky Hierarchy

NL is not regular: Chomsky’s 1957 original argument

Structure of his argument:

Consider 3 hypothetical languages:

1.
ab, aabb, aaabbb

(a
n
b
n
)

2.
aa, bb, abba, baab, aaaa, bbbb, aabbaa, abbbba, …

(palindromic)

3.
aa, bb, abab, baba, aaaa, bbbb, aabaab, abbabb, aababaabab

(copy
language)



It can be shown that these are not regular languages


LIN 6932

24

NL and Chomsky Hierarchy

The Pumping Lemma

(a technique for proving that certain
languages are
not
regular).

If
L
is an infinite finite automaton language (FAL) over
alphabet
A
, then there are strings
x,y,z


A
* such that
y



and
xy
n
z


L
for all
n
≥ 0.

Why: machines for infinite languages must have loops. The
string
y
in the lemma corresponds to a string accepted
during a traversal of a loop.

Note that lemma does not say ‘iff’.

LIN 6932

25

NL and Chomsky Hierarchy

The Pumping Lemma

(a technique for proving that certain languages
are
not
regular).

Example
:
L
= {
a
n
b
n

|
n
≥ 0 }


If

L

were

a

FAL,

then,

by

Pumping

Lemma,

there

would

be

x,y,z



A
*

such

that


y





and

xy
n
z



L

for

all

n



0
.

Assume

that

such

x,y,z

exist

so

the

string

xyz

is

in

L
.

But

by

definition

of

L

it

should

be

in

the

form

a
n
b
n

for

some

n
.

What’s

y

in

this

case?

It

can’t

be

empty,

so

it

would

consist

of

(
1
)

some

number

of

a’
s,

or

(
2
)

some

number

of

b’
s,

or

(
3
)

some

number

of

a’
s

followed

by

some

number

of

b’
s
.

But

it

is

easy

to

see

that

in

any

of

those

cases

the

strings

xyyz,

xyyyz
,

etc
.

could

not

belong

to

L
.

But

the

pumping

theorem

is

not

always

useful

for

showing

a

language

to

be

non
-
regular
.

LIN 6932

26

NL and Chomsky Hierarchy

NL is not regular
-

at least context
-
free power: Chomsky’s 1957 original argument:






LIN 6932

27

NL and Chomsky Hierarchy

Therefore,

Chomsky

claims,

English

cannot

be

regular

“It

is

clear,

then

that

in

English

we

can

find

a

sequence

a
+
S
1
+
b
,

where

there

is

a

dependency

between

a

and

b
,

and

we

can

select

as

S
1

another

sequence

c
+
S
2
+
d
,

where

there

is

a

dependency

between

c

and

d



etc
.

A

set

of

sentences

that

is

constructed

in

this

way



will

have

all

of

the

mirror

image

properties

of

(
2
)

which

exclude

(
2
)

from

the

set

of

finite

languages
.


(Chomsky

1957
,

p
.
21
)


Note
:

Chomsky

writes

“finite

languages”,

but

he

means

“regular

languages”
.




LIN 6932

28

NL and Chomsky Hierarchy

Chomsky’s argument: because English contains these
constructions, which are not regular, English is not regular.

As stated, the argument is fallacious.



LIN 6932

29

NL and Chomsky Hierarchy

How to state the observation correctly

LIN 6932

30

NL and Chomsky Hierarchy

Similar point about
center
-
embedding/nested dependencies


the cats

that
the dog

chases miau


the dependency between
the dog

and
chases

nests within the dependency
between
the cats

and
miau



Assume the following homomorphism:



a = {the cat, the dog, the rat, …}



b = {chase, miau, bite, bark, …}



Then this is an instance of
a
n
b
n

Chomsky’s argument:


Any useful syntactic analysis will relate the nouns to their corresponding verbs.



No FSA is capable of keeping track of center embeddings of arbitrary depth
(which would be required since the grammatical subset of L is infinite). No FSA
can provide a
useful syntactic analysis

for center
-
embedding.

Therefore, since English has such constructions, English is non regular language.

LIN 6932

31

NL and Chomsky Hierarchy


LIN 6932

32

NL and Chomsky Hierarchy

The language
a
n
b
n

corresponds to the context
-
free grammar

S


a S b

S


a b

It gives rise to the following tree for the string
aaabbb

LIN 6932

33

NL and Chomsky Hierarchy

Dissenting view 1:


all arguments to this effect use
center
-
embedding/nested
dependencies
,

cp.


the cats

that
the dog chases

miau


the dependency between
the dog

and
chases

nests within the

dependency between
the cats

and
miau


humans are extremely bad at processing center
-
embedding



notion of
competence
that ignores this is dubious



natural languages are regular after all





LIN 6932

34

NL and Chomsky Hierarchy

Dissenting view 2:



Any

*finite*

language

is

a

regular

language
.




If

you

don't

distinguish

performance

and

competence,

then

English

as

a

language

certainly

couldn't

contain

any

sentence

longer

than

the

number

of

words

a

human

being

could

utter

in

a

lifetime
.

(This

assumes

human

lifetimes

are

finite,

but

that

seems

uncontroversial
.
)




This

may

be

a

HUGE

number,

but

it

is

definitely

finite,

and

so

without

the

distinction

English

is

formally

a

finite

language

and

therefore

regular
.



LIN 6932

35

NL and Chomsky Hierarchy

Are natural languages context
-
free?



history of the question: Chomsky’s 1957 conjecture that natural


languages are not context
-
free



In the 60’s and 70’s, many attempts to prove that NL is not context
-
free



Pullum and Gazdar 1982 (Generalized Phrase Structure Grammar):


-

all these attempts have failed

-

for all we know, natural languages (conceived as string sets) might be
context
-
free



Huybregts 1984, Shieber 1985:


-

proof that Swiss German is not context
-
free, cross
-
serial dependencies



Culy 1985: proof that Bambara (a Northwestern Mande language


spoken in Mali) ) is not context
-
free


LIN 6932

36

NL and Chomsky Hierarchy



Nested and Crossing Dependencies

Context
-
free languages
--

unlike regular languages
--

can have unbounded
dependencies.

However, these dependencies can only be nested, not crossing.


Example:

a
n
b
n
has unlimited nested dependencies: context
-
free

The copy language has unlimited crossing dependencies: not context
-
free

LIN 6932

37

NL and Chomsky Hierarchy



Nested and Crossing Dependencies


Bar
-
Hillel and Shamir (1960):



English contains copy
-
language (crossing dependencies)



Cannot be context
-
free

John
,
Mary
,
David
, ... are
a widower
,
a widow
,
a widower
, ..., respectively.




Claim: the sentence is only grammatical under the condition that if the
n
th name is
male (female) then the
n
th phrase after the copula is
a widower (a widow)

LIN 6932

38

NL and Chomsky Hierarchy



Nested and Crossing Dependencies


suppose the claim is true, intersect English with regular language



L
1

=(
Paul
|
Paula
)
+
are
(
a widower
|
a widow
)
+
respectively

Result: Copy language L
3
{
ww
|
w



(
a
|
b
)
+
}

English


L
1

= L
2
, homomorphism L
2


L
3


John, David, Paul, …


a


a widower


a


Mary, Paula, Betty, …


b



a widow


b


are, respectively




LIN 6932

39

NL and Chomsky Hierarchy


Result: Copy language L
3

{
ww
|
w



(
a
|
b
)
+
}

Copy language is not context
-
free

Hence L
2

is not

Hence English is not

LIN 6932

40

NL and Chomsky Hierarchy

Cross
-
serial dependencies in Dutch

Huybregt (1976)


Dutch has copy
-
language like structures


thus Dutch is not context
-
free


dat Jan
Marie
Pieter
Arabisch
laat
zien
schrijven

that jan marie pieter arabic let see write

‘that Jan let Marie see Pieter write Arabic’




LIN 6932

41

NL and Chomsky Hierarchy

Counterargument

Crossing dependencies only concern argument linking,
i.e., semantics

As far as plain strings are concerned, the relevant
fragment of Dutch has the structure

NP
n
V
n

which is context
-
free