A Quarter

Century of
Efficient Learnability
Rocco Servedio
Columbia University
Valiant 60
th
Birthday Symposium
Bethesda, Maryland
May 30, 2009
1984
and of course...
Probably Approximately Correct learning
[Valiant84]
•
Concept class
of Boolean functions over domain X
[Valiant84] presents range of learning
models, oracles
D models (possibly complex) world
•
Unknown
and
arbitrary
distribution over X
•
Unknown target concept to be learned from examples
Learner has access to i.i.d. draws from labeled according to
each belongs to X,
i.i.d. drawn from
typically or
PAC learning concept class
Learner’s goal:
come up with hypothesis that will have high accuracy on future examples.
•
For any target function
•
for any distribution over X,
•
with probability learner outputs hypothesis
that is

accurate w.r.t.
Efficiently
Algorithm must be
computationally efficient:
should run in time
“The results of
learnability theory
would then indicate the maximum
granularity of the single concepts that can be acquired without
programming.”
“This paper attempts to explore the limits of what is learnable as allowed
by algorithmic complexity….
The identification of these limits is a
major goal of the line of work proposed in this paper.
”
PAC model, and its variants, provide a clean theoretical framework for
studying the
computational complexity
of learning problems.
From
:
So, what can be learned efficiently?
In the rest of the 1980s,
Valiant & colleagues gave remarkable results on the abilities and
limitations of computationally efficient learning algorithms.
This work introduced research directions and questions that continue
to be intensively studied to this day.
25 years of efficient learnability
Rest of talk: survey some
•
positive results (algorithms)
•
negative results (two flavors of hardness results)
(Didn’t just ask the question “what can be learned efficiently”
–
he did a great deal towards answering it.
(highlight some of these contributions and how the field has evolved since then)
Positive results: learning k

DNF
Theorem
[Valiant84]
: k

DNF learnable in polynomial time for any k=O(1).
k=2:
View a k

DNF as a disjunction over “metavariables”, learn the
disjunction using elimination.
25 years later: improving this to k is still a major open question!
Much has been learned in trying for this improvement…
Poly

time PAC learning, general distributions
•
Decision lists (greedy alg.)
[Rivest87]
•
Halfspaces (poly

time LP)
[Littlestone87, BEHW89, …]
•
Parities, integer lattices (Gaussian elim.)
[HelmboldSloanWarmuth92, FischerSimon92]
•
Restricted types of branching programs (DL + parities)
[ErgunKumarRubinfeld95, BshoutyTamonWilson98]
•
Geometric concept classes (…random projections…)
[BshoutyChenHomer94, BGMST98, Vempala99, …]
and more…
+
+
+
+









+
+
+
+
General

distribution PAC learning, cont
Quasi

poly / sub

exponential

time learning:
•
poly

size decision trees
[EhrenfeuchtHaussler89, Blum92]
•
poly

size DNF
[Bshouty96, TaruiTsukiji99,
KlivansS01]
•
intersections of few
poly(n)

weight halfspaces
[KlivansO’DonnellS02]
“PTF method” (halfspaces + metavariables)

link with complexity theory
OR
AND
AND
AND
x
2
x
3
x
5
x
6
x
3
x
5
x
1
x
6
x
7
_
_
_
_
+
+
+
+
















x
3
x
5
x
1
x
1
x
5
x
4

1
1

1
1

1
1
1
Distribution

specific learning
Theorem
[KearnsLiValiant87]
: monotone
Boolean functions can be weakly learned
(accuracy ) in poly time
under the uniform distribution on
Ushered in study of algorithms for uniform

distribution and distribution

specific learning:
halfspaces
[Baum90]
, DNF
[Verbeurgt90, Jackson95]
, decision trees
[KushilevitzMansour93]
, AC
0
[LinialMansourNisan89, FurstJacksonSmith91]
, extended AC
0
[JacksonKlivansS02]
,
juntas
[MosselO’DonnellS03]
, general monotone functions
[BshoutyTamon96, BlumBurchLangford98,
O’DonnellWimmer09],
monotone decision trees
[O’DonnellS06]
, intersections of halfspaces
[BlumKannan94,
Vempala97, KwekPitt98, KlivansO’DonnellS08]
, convex sets, much more…
Key tool:
Fourier analysis
of Boolean functions
Recently come full circle on monotone functions:
–
[O’DonnellWimmer09]
: poly time, accuracy:
optimal! (by
[BlumBurchLangford98]
)
1
0
1
Other variants
After
[Valiant84],
efficient learning algorithms studied in many settings:
•
Learning in the presence of noise: malicious
[Valiant85]
, agnostic
[KearnsSchapireSellie93]
, random misclassification
[AngluinLaird87]
,…
•
Related models: Exact learning from queries and
counterexamples
[Angluin87],
Statistical Query Learning
[Kearns93],
many others…
•
PAC

style analyses of unsupervised learning problems: learning
discrete distributions
[KMRRSS94]
, learning mixture distributions
[Dasgupta99, AroraKannan01, many others…]
•
Evolvability framework
[Valiant07, Feldman08, …]
Nice algorithmic results in all these settings.
Limits of efficient learnability:
is proper learning feasible?
Proper
learning:
learning algorithm for class must uses hypotheses from
There are efficient proper learning algorithms for conjunctions,
disjunctions, halfspaces, decision lists, parities, k

DNF, k

CNF.
•
What about k

term DNF
–
can we learn
using
k

term DNF as hypotheses?
Proper learning is computationally hard
Theorem
[PittValiant87]
: If no poly

time algorithm can
learn 3

term DNF using 3

term DNF hypotheses.
Given a graph reduction
produces distribution over
labeled examples such that
high

accuracy 3

term DNF
iff
is 3

colorable.
Note:
can
learn 3

term DNF in poly time using 3

CNF hypotheses!
“Often a change of representation can make a difficult learning task easy.”
reduction
distribution over
(011111, +) (001111,

)
(101111, +) (010111,

)
(110111, +) (011101,

)
… …
From 1987…
Theorem
[PittValiant87]
: Learning k

term DNF using (2k

3)

term DNF
hypotheses is hard.
Opened door to whole range of hardness results:
is hard to learn using hypotheses from
This work showed computational barriers to learning with restricted
representations in general, not just proper learning:
… to 2009
Great progress in recent years using sophisticated machinery from
hardness of approximation.
•
[ABFKP04]
: Hard to learn n

term DNF using n
100

size OR

of

halfspace
hypotheses.
[Feldman06]
: Holds even if learner can make membership
queries to target function.
•
[KhotSaket08]
: Hard to (even weakly) learn intersection of 2 halfspaces
using 100 halfspaces as hypothesis
If data is corrupted with 1% noise, then
•
[FeldmanGopalanKhotPonnuswami08]
: Hard to (even weakly)
learn an AND using an AND as hypothesis. Same for halfspaces.
•
[GopalanKhotSaket07, Viola08]
: Hard to (even weakly) learn a parity even
using degree

100 GF(2) polynomials as hypotheses
Active area with lots of ongoing work.
Representation

Independent Hardness
Suppose there are
no hypothesis restrictions
: any poly

size circuit OK.
Are there learning problems that are
still
hard for computational reasons?
[Valiant84]:
Existence of
pseudorandom functions
[GoldreichGoldwasserMicali84]
implies that general Boolean circuits
are (representation

independently) hard to learn.
Yes:
PKC and hardness of learning
Key insight of
[KearnsValiant89]:
Public

key cryptosystems
hard

to

learn functions.
–
Adversary can create labeled examples
of by herself…so must not be learnable from labeled
examples, or else cryptosystem would be insecure!
Theorem
[KearnsValiant89]
:
Simple
classes of functions
–
NC
1
, TC
0
,
poly

size DFAs
–
are inherently hard to learn.
Theorem
[Regev05, KlivansSherstov06]
:
Really simple
functions
–
poly

size OR of halfspaces
–
are inherently hard to learn.
Closing the gap:
Can these results be extended to show that DNF are
inherently hard to learn? Or are DNF efficiently learnable?
Efficient learnability:
Model and Results
Thank you, Les!
Valiant
•
provided an elegant
model
for the computational study of learning
•
followed this up with foundational
results
on what is (and isn’t)
efficiently learnable
These fundamental questions continue to be intensively studied and
cross

fertilize other topics in TCS.
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment