Language Learning Week 4
Pieter Adriaans: pietera@science.uva.nl
Sophia Katrenko: katrenko@science.uva.nl
Contents Week 4
Information theory
Learning as Data Compression
Learning regular languages using DFA
Minimum Description Length Princoiple
So we have to turn our attention to probabilistic solutions
1969: Horning: Probabilistic context

free grammars can be learned from
positive data.
Given a text T and two grammars G1 and G2 we are able to approximate
max({P(G1T), P(G2T)})
How do we select the right sample?
Probability distribution over the sample space
Correct learning
Approximately Correct Learning
Approximately Correct Learning
Probably Approximately Correct
Learning
PAC Learning (Valiant 83)
PAC Learning
*
=P on
*
PAC Learning
f
*
=P on
*
PAC Learning
f
*
g
=P on
*
PAC Learning
f
*
g
f
g
=P on
*
PAC Learning
f
*
g
f
g
=P on
*
P(f
g)
with probability (1

)
PAC Learning
For all target concepts f
F and all probability distributions P on
the
algorithm A outputs a concept g
F such that with probability (1

),
P(f
g)
F = concept class
= confidence parameter
= error parameter
f
g = (f

g)
(g

f)
Polynomial in
and
Boolean Clauses: k

CNF
U = {u1, …, un}
Boolean variables
t: U
{t,f}
Truth Assignment
t(u) = t
u is true
t(u) = f
u is false
u,
u
literals
{u1,
u2
, u4}
clause (set of literals)
{{u1,
u2
, u4},{u1,
u3
, u5}}
Conjunctive normal form (CNF)
k

CNF: a CNF with clauses of length k
Boolean Clauses: Satisfiability
A clause is
satisfied
by a truth assignment if at least one of its members is
true
A collection of clauses is
satisfiable
if there exists a truth assignment that
satisfies all clauses
SAT is NP

complete
k

SAT: restriction to clauses with k variables
Boolean concepts (1)
Boolean Concept: A collection of vectors vi of U
Example: U = {a,bc}
a b c
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Boolean concepts (2)
Boolean Concept: A collection of vectors vi of U
Example: U = {a,bc}
a b c
0 0 0
0 0 1
A collection of clauses determines a
0 1 0
Boolean concept
0 1 1
= C = {{
a
, b}, {
b
, c}}
1 0 0
I.e. a 2

CNF
1 0 1
1 1 0
1 1 1
Learning Boolean concepts

Form the set S of all 2

CNF on U

Do n times:

Call sample vector v

Remove all clauses in S that are not compatible with v

Output S
C = (
a
&
b
&
c
) v (
a
&
b
& c) v (
a
& b & c) v (a & b & c)
{{
a
, b}, {
b
, c}}
Learning Boolean concepts

Form the set S of all 2

CNF on U

Do n times:

Call sample vector v

Remove all clauses in S that are not compatible with v

Output S
C = (
a
&
b
&
c
) v (
a
&
b
& c) v (
a
& b & c) v (a & b & c)
{{
a
, b}, {
b
, c}}
{ {a, b}, {a,
b
}, {
a
, b}, {
a
,
b
}, {a, c}, {a,
c
}, {
a
, c}, {
a
,
c
},
{b, c}, {b,
c
}, {
b
, c}, {
b
,
c
} }
Learning Boolean concepts

Form the set S of all 2

CNF on U

Do n times:

Call sample vector v

Remove all clauses in S that are not compatible with v

Output S
C = (
a
&
b
&
c
) v (
a
&
b
& c) v (
a
& b & c) v (a & b & c)
{{
a
, b}, {
b
, c}}
{ {a, b}, {a,
b
}, {
a
, b}, {
a
,
b
}, {a, c}, {a,
c
}, {
a
, c}, {
a
,
c
},
{b, c}, {b,
c
}, {
b
, c}, {
b
,
c
} }
Example = 0,0,0
Learning Boolean concepts

Form the set S of all 2

CNF on U

Do n times:

Call sample vector v

Remove all clauses in S that are not compatible with v

Output S
C = (
a
&
b
&
c
) v (
a
&
b
& c) v (
a
& b & c) v (a & b & c)
{{
a
, b}, {
b
, c}}
{ {a, b}, {a,
b
}, {
a
, b}, {
a
,
b
}, {a, c}, {a,
c
}, {
a
, c}, {
a
,
c
},
{b, c}, {b,
c
}, {
b
, c}, {
b
,
c
} }
Example = 0,0,0
Learning Boolean concepts

Form the set S of all 2

CNF on U

Do n times:

Call sample vector v

Remove all clauses in S that are not compatible with v

Output S
C = (
a
&
b
&
c
) v (
a
&
b
& c) v (
a
& b & c) v (a & b & c)
{{
a
, b}, {
b
, c}}
{ {a, b}, {a,
b
}, {
a
, b}, {
a
,
b
}, {a, c}, {a,
c
}, {
a
, c}, {
a
,
c
},
{b, c}, {b,
c
}, {
b
, c}, {
b
,
c
} }
Example = 0,0,0
Example = 0,1,1
Example = 0,0,1
...
K

CNF are PAC learnable
U = {u1, …, un} a set of Boolean variables
A Boolean concept C:
a set of vectors vi over U
a k

CNF equivalent to C
An arbitrary probability distribution P over C such that
vi
C
P(vi)=1
(I.e P(C)=1)
Do n times: vi = Examples,
For each vi in C delete ci from C if vi
ci
=P on
*
C
*
C’
C
C’
P(C
C’)
with probability (1

)
K

CNF are PAC learnable
U = {u1, …, un} a set of Boolean variables
A Boolean concept C:
a set of vectors vi over U
a k

CNF equivalent to C
An arbitrary probability distribution P over C such that
vi
C
P(vi)=1
(I.e P(C)=1)
Do n times: vi = Examples,
For each vi in C delete ci from C if vi
ci
P(C)=1
P(C
C’)=
≥
(1

)
n
Fact:
e

≥
1

(0
≤
≤ 1)
≥
e

n
≥ (
1

)
n
1/
≤
e
n
ln 1/
≤
n
n
≥
1/
ln 1/
=P on
*
C
*
C’
C
C’
P(C
C’)
with probability (1

)
Learning infinite classes: Characteristic sample
Let
be an
alphabet,
the set of
all strings
over
L(G) =S
is the
language
generated by a
grammar G
C
G
S is a
characteristic
sample for G
S
C
G
Contents Week 4
PAC Learning
Learning Boolean concepts
Learning as Compression
Comments 0
Log in to post a comment