# Mary Cryan - School of Informatics

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 1 μήνα)

53 εμφανίσεις

PAC-Learning
Mary Cryan
LFCS,
School of Informatics.
Learning
We have
Data
...We want an
hypothesis
to model the data.
Application areas:bioinformatics (protein structure,evolution),
stock market behaviour,Neuroinformatics (?)
We want to do this
well
(hypothesis is close to “the truth”) and,
in Informatics,we want to do this
eﬃciently
.
1
Complexity for Learning Theory
PAC-learning:

Concept was invented by Leslie Valiant in 1984.

His
goal
was (they say) to get CS people to formally analyze
learning algorithms.

Mixture of
statistics
and
computational complexity
.
Big question - is it relevant or not?
2
Basic Assumptions of PAC model
Instances:
Space is Ω={0,1}
n
Concepts:
(“Hypotheses”):subsets of {0,1}
n
(ie,concepts are Boolean functions over {0,1}
n
.)
Assume some (maybe unknown) distribution D over inputs from
{0,1}
n
.
Given a (unknown) concept c,an hypothesis h (c,h:{0,1}
n

{0,1})
error(h) =
￿
x∈hΔc
D(x).
Hypothesis h is a good approximation to concept c if error(h) is
small (wrt c).
3
Learning from Samples
In the basic PAC-model,we do not know the concept (clearly)
but are given
Labelled Examples
(of the target concept c).
One Instance (example/sample) is some vector x ∈ {0,1}
n
,to-
gether with a
label
+ or −.
+ means x ∈ c
− means x ￿∈ c
Examples generated (
sampled
) according to D (often uniform).
No noise.
4
PAC-learning of Concepts
Now suppose we have a set of examples (taken from some un-
known c:{0,1}
n
→{0,1}).
How do we
obtain
a good hypothesis?
Eﬃciently?
From a small amount of data (ie number of examples)?
We are (usually) in the passive state
Can’t ask for a particular example and its label....
5
PAC-learning
C
n
= set of possible target concepts (C
n
⊆ {f:{0,1}
n
→{0,1})
H
n
= set of possible hypotheses (not always the same as C
n
)
Concept class C ={C
n
}
n≥1
,hypothesis class H={H
n
}
n≥1
Defn 1
A concept class C is PAC-learnable by hypothesis class
Hif there is a
polynomial-time
algorithmA which for any concept
c ∈ C
n
,and any 0 < ￿,δ < 1 (and any distribution D on {0,1}
n
)...
when given p(n,￿
−1

−1
) samples drawn from D,the algorithm
can construct an hypothesis h ∈ H
n
such that error(h) ≤ ￿ with
probability at least 1 −δ.
6
PAC-Learning
“Probably Approximately Correct”
Probably
- “with probability at least 1 −δ”.
(can’t expect correct with prob.1 -
why
)
Approximately
- error
c
(h) ≤ ￿
(so many functions - 2
2
n+1
in total - can’t distinguish between
all with polynomial samples - so we allow some error)
for all concepts in C ={C
n
}
n≥1
.
7
PAC-Learning and Complexity
The
number of samples
p(n,￿
−1

−1
) must be polynomial in n,
￿
−1

−1
(usually have logδ
−1
).
The
algorithmA
must run in time polynomial in n and p(n,￿
−1

−1
)
for all concepts in C ={C
n
}
n≥1
.
PAC-learnability can be thought of as a complexity class (like P,
RP etc)
8
An example
Suppose we have a underlying function f:{0,1}
n
→{0,1} (our
concept).
f(x) =1 iﬀ
n
￿
i=1
x
i
=1 mod 5 OR
n
￿
i=1
x
i
=1 mod 4
Suppose n =15.Here are some examples:
(1,1,1,1,0,1,0,0,1,0,0,1,1,0,1),+)
(0,1,1,0,1,0,1,0,1,0,0,1,1,0,0),−)
(0,0,1,0,1,0,1,0,1,0,0,0,1,0,0),+)
How would an (eﬃcient) algorithm
guess
f?(or a good approx-
imation)?
For eﬃcient learning to be possible,
concept class
is usually
restricted.
9
Big Open question
Biggest open question in PAC-learning
DNF formulae
C
1
or C
2
or...or C
m
Each C
i
is a conjunction of variables.
Test for
satisﬁability
?
easy
!
Learn
:Can boolean functions be learned in DNF (Disjunctive
Normal Form)....no one knows
10
Results on PAC-learning
k-DNF (each clause has k literals) is PAC-learnable -
Valiant 1984.
Monotone DNF (no negated variables) -
Learnable with
membership queries
- is x a positive example?
Angluin 1988.
See Haussler “Probably Approximately Correct Learning” for
more.
11
Criticisms of PAC model
No noise
- assume that all examples are correctly labelled.
Variants of PAC-model incorporate noise - see refs in Haussler.
Worst-case analysis
- “for all concepts c ∈ C...”
Genuine complaint - many people working in Learning don’t care
about worst-case (it never arises).
12
Learning of Distributions
This is our next talk!!
Variant on what we saw in this talk.
We have a distribution D over some state space Ω (not neces-
sarily {0,1}
n
).Our goal -
learn the distribution D
.
This is learning in the
unlabelled case
-
Instances neither belong to,nor are excluded from,the concept
(which is the distribution D)
Instances just have some probability for that concept.
13