PACLearning
Mary Cryan
LFCS,
School of Informatics.
Learning
We have
Data
...We want an
hypothesis
to model the data.
Application areas:bioinformatics (protein structure,evolution),
stock market behaviour,Neuroinformatics (?)
We want to do this
well
(hypothesis is close to “the truth”) and,
in Informatics,we want to do this
eﬃciently
.
1
Complexity for Learning Theory
PAClearning:
•
Concept was invented by Leslie Valiant in 1984.
•
His
goal
was (they say) to get CS people to formally analyze
learning algorithms.
•
Mixture of
statistics
and
computational complexity
.
Big question  is it relevant or not?
2
Basic Assumptions of PAC model
Instances:
Space is Ω={0,1}
n
Concepts:
(“Hypotheses”):subsets of {0,1}
n
(ie,concepts are Boolean functions over {0,1}
n
.)
Assume some (maybe unknown) distribution D over inputs from
{0,1}
n
.
Given a (unknown) concept c,an hypothesis h (c,h:{0,1}
n
→
{0,1})
error(h) =
x∈hΔc
D(x).
Hypothesis h is a good approximation to concept c if error(h) is
small (wrt c).
3
Learning from Samples
In the basic PACmodel,we do not know the concept (clearly)
but are given
Labelled Examples
(of the target concept c).
One Instance (example/sample) is some vector x ∈ {0,1}
n
,to
gether with a
label
+ or −.
+ means x ∈ c
− means x ∈ c
Examples generated (
sampled
) according to D (often uniform).
No noise.
4
PAClearning of Concepts
Now suppose we have a set of examples (taken from some un
known c:{0,1}
n
→{0,1}).
How do we
obtain
a good hypothesis?
Eﬃciently?
From a small amount of data (ie number of examples)?
We are (usually) in the passive state
Can’t ask for a particular example and its label....
5
PAClearning
C
n
= set of possible target concepts (C
n
⊆ {f:{0,1}
n
→{0,1})
H
n
= set of possible hypotheses (not always the same as C
n
)
Concept class C ={C
n
}
n≥1
,hypothesis class H={H
n
}
n≥1
Defn 1
A concept class C is PAClearnable by hypothesis class
Hif there is a
polynomialtime
algorithmA which for any concept
c ∈ C
n
,and any 0 < ,δ < 1 (and any distribution D on {0,1}
n
)...
when given p(n,
−1
,δ
−1
) samples drawn from D,the algorithm
can construct an hypothesis h ∈ H
n
such that error(h) ≤ with
probability at least 1 −δ.
6
PACLearning
“Probably Approximately Correct”
Probably
 “with probability at least 1 −δ”.
(can’t expect correct with prob.1 
why
)
Approximately
 error
c
(h) ≤
(so many functions  2
2
n+1
in total  can’t distinguish between
all with polynomial samples  so we allow some error)
for all concepts in C ={C
n
}
n≥1
.
7
PACLearning and Complexity
The
number of samples
p(n,
−1
,δ
−1
) must be polynomial in n,
−1
,δ
−1
(usually have logδ
−1
).
The
algorithmA
must run in time polynomial in n and p(n,
−1
,δ
−1
)
for all concepts in C ={C
n
}
n≥1
.
PAClearnability can be thought of as a complexity class (like P,
RP etc)
8
An example
Suppose we have a underlying function f:{0,1}
n
→{0,1} (our
concept).
f(x) =1 iﬀ
n
i=1
x
i
=1 mod 5 OR
n
i=1
x
i
=1 mod 4
Suppose n =15.Here are some examples:
(1,1,1,1,0,1,0,0,1,0,0,1,1,0,1),+)
(0,1,1,0,1,0,1,0,1,0,0,1,1,0,0),−)
(0,0,1,0,1,0,1,0,1,0,0,0,1,0,0),+)
How would an (eﬃcient) algorithm
guess
f?(or a good approx
imation)?
For eﬃcient learning to be possible,
concept class
is usually
restricted.
9
Big Open question
Biggest open question in PAClearning
DNF formulae
C
1
or C
2
or...or C
m
Each C
i
is a conjunction of variables.
Test for
satisﬁability
?
easy
!
Learn
:Can boolean functions be learned in DNF (Disjunctive
Normal Form)....no one knows
10
Results on PAClearning
kDNF (each clause has k literals) is PAClearnable 
Valiant 1984.
Monotone DNF (no negated variables) 
Learnable with
membership queries
 is x a positive example?
Angluin 1988.
See Haussler “Probably Approximately Correct Learning” for
more.
11
Criticisms of PAC model
No noise
 assume that all examples are correctly labelled.
Variants of PACmodel incorporate noise  see refs in Haussler.
Worstcase analysis
 “for all concepts c ∈ C...”
Genuine complaint  many people working in Learning don’t care
about worstcase (it never arises).
12
Learning of Distributions
This is our next talk!!
Variant on what we saw in this talk.
We have a distribution D over some state space Ω (not neces
sarily {0,1}
n
).Our goal 
learn the distribution D
.
This is learning in the
unlabelled case

Instances neither belong to,nor are excluded from,the concept
(which is the distribution D)
Instances just have some probability for that concept.
13
Comments 0
Log in to post a comment