# Language Learning Week 4

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 6 μήνες)

122 εμφανίσεις

Language Learning Week 4

Sophia Katrenko: katrenko@science.uva.nl

Contents Week 4

Information theory

Learning as Data Compression

Learning regular languages using DFA

Minimum Description Length Princoiple

So we have to turn our attention to probabilistic solutions

1969: Horning: Probabilistic context
-
free grammars can be learned from
positive data.

Given a text T and two grammars G1 and G2 we are able to approximate
max({P(G1|T), P(G2|T)})

How do we select the right sample?

Probability distribution over the sample space

Correct learning

Approximately Correct Learning

Approximately Correct Learning

Probably Approximately Correct
Learning

PAC Learning (Valiant 83)

PAC Learning


*

=P on

*

PAC Learning

f


*

=P on

*

PAC Learning

f


*

g

=P on

*

PAC Learning

f


*

g

f

g

=P on

*

PAC Learning

f


*

g

f

g

=P on

*

P(f

g)

with probability (1
-

)

PAC Learning

For all target concepts f

F and all probability distributions P on


the
algorithm A outputs a concept g

F such that with probability (1
-

),

P(f

g)

F = concept class

= confidence parameter

= error parameter

f

g = (f
-
g)

(g
-
f)

Polynomial in

and

Boolean Clauses: k
-
CNF

U = {u1, …, un}

Boolean variables

t: U

{t,f}

Truth Assignment

t(u) = t

u is true

t(u) = f

u is false

u,
u

literals

{u1,
u2
, u4}

clause (set of literals)

{{u1,
u2
, u4},{u1,
u3
, u5}}

Conjunctive normal form (CNF)

k
-
CNF: a CNF with clauses of length k

Boolean Clauses: Satisfiability

A clause is
satisfied

by a truth assignment if at least one of its members is
true

A collection of clauses is
satisfiable

if there exists a truth assignment that
satisfies all clauses

SAT is NP
-
complete

k
-
SAT: restriction to clauses with k variables

Boolean concepts (1)

Boolean Concept: A collection of vectors vi of U

Example: U = {a,bc}

a b c

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

Boolean concepts (2)

Boolean Concept: A collection of vectors vi of U

Example: U = {a,bc}

a b c

0 0 0

0 0 1

A collection of clauses determines a

0 1 0

Boolean concept

0 1 1

= C = {{
a
, b}, {
b
, c}}

1 0 0

I.e. a 2
-
CNF

1 0 1

1 1 0

1 1 1

Learning Boolean concepts

-

Form the set S of all 2
-
CNF on U

-

Do n times:

-

Call sample vector v

-

Remove all clauses in S that are not compatible with v

-

Output S

C = (
a
&
b

&
c
) v (
a
&
b

& c) v (
a
& b & c) v (a & b & c)

{{
a
, b}, {
b
, c}}

Learning Boolean concepts

-

Form the set S of all 2
-
CNF on U

-

Do n times:

-

Call sample vector v

-

Remove all clauses in S that are not compatible with v

-

Output S

C = (
a
&
b

&
c
) v (
a
&
b

& c) v (
a
& b & c) v (a & b & c)

{{
a
, b}, {
b
, c}}

{ {a, b}, {a,
b
}, {
a
, b}, {
a
,
b
}, {a, c}, {a,
c
}, {
a
, c}, {
a
,
c
},

{b, c}, {b,
c
}, {
b
, c}, {
b
,
c
} }

Learning Boolean concepts

-

Form the set S of all 2
-
CNF on U

-

Do n times:

-

Call sample vector v

-

Remove all clauses in S that are not compatible with v

-

Output S

C = (
a
&
b

&
c
) v (
a
&
b

& c) v (
a
& b & c) v (a & b & c)

{{
a
, b}, {
b
, c}}

{ {a, b}, {a,
b
}, {
a
, b}, {
a
,
b
}, {a, c}, {a,
c
}, {
a
, c}, {
a
,
c
},

{b, c}, {b,
c
}, {
b
, c}, {
b
,
c
} }

Example = 0,0,0

Learning Boolean concepts

-

Form the set S of all 2
-
CNF on U

-

Do n times:

-

Call sample vector v

-

Remove all clauses in S that are not compatible with v

-

Output S

C = (
a
&
b

&
c
) v (
a
&
b

& c) v (
a
& b & c) v (a & b & c)

{{
a
, b}, {
b
, c}}

{ {a, b}, {a,
b
}, {
a
, b}, {
a
,
b
}, {a, c}, {a,
c
}, {
a
, c}, {
a
,
c
},

{b, c}, {b,
c
}, {
b
, c}, {
b
,
c
} }

Example = 0,0,0

Learning Boolean concepts

-

Form the set S of all 2
-
CNF on U

-

Do n times:

-

Call sample vector v

-

Remove all clauses in S that are not compatible with v

-

Output S

C = (
a
&
b

&
c
) v (
a
&
b

& c) v (
a
& b & c) v (a & b & c)

{{
a
, b}, {
b
, c}}

{ {a, b}, {a,
b
}, {
a
, b}, {
a
,
b
}, {a, c}, {a,
c
}, {
a
, c}, {
a
,
c
},

{b, c}, {b,
c
}, {
b
, c}, {
b
,
c
} }

Example = 0,0,0

Example = 0,1,1

Example = 0,0,1

...

K
-
CNF are PAC learnable

U = {u1, …, un} a set of Boolean variables

A Boolean concept C:

a set of vectors vi over U

a k
-
CNF equivalent to C

An arbitrary probability distribution P over C such that

vi

C
P(vi)=1

(I.e P(C)=1)

Do n times: vi = Examples,

For each vi in C delete ci from C if vi

ci

=P on

*

C


*

C’

C

C’

P(C

C’)

with probability (1
-

)

K
-
CNF are PAC learnable

U = {u1, …, un} a set of Boolean variables

A Boolean concept C:

a set of vectors vi over U

a k
-
CNF equivalent to C

An arbitrary probability distribution P over C such that

vi

C
P(vi)=1

(I.e P(C)=1)

Do n times: vi = Examples,

For each vi in C delete ci from C if vi

ci

P(C)=1

P(C

C’)=

(1
-

)
n

Fact:
e
-

1
-

(0

≤ 1)

e
-
n

≥ (
1
-

)
n

1/

e
n

ln 1/

n

n

1/

ln 1/

=P on

*

C


*

C’

C

C’

P(C

C’)

with probability (1
-

)

Learning infinite classes: Characteristic sample

Let

be an
alphabet,


the set of
all strings
over

L(G) =S



is the
language
generated by a
grammar G

C
G

S is a
characteristic
sample for G



S

C
G

Contents Week 4

PAC Learning

Learning Boolean concepts

Learning as Compression