# PAC Learning - Zhou Ji

Τεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 4 χρόνια και 8 μήνες)

75 εμφανίσεις

PAC Learning

8/5/2005

purpose

Effort to understand negative selection
algorithm from totally different aspects

Statistics

Machine learning

What is machine learning, in a very
informal way?

Looking for mathematical tool to describe,
analyze, evaluate either a learning
algorithm, or learning problem.

background

PAC learning framework
is a branch of computational
learning theory.

Computational learning theory

is a mathematical field
related to the analysis of machine learning algorithms. It
is actually considered as a field of statistics.

Machine learning algorithms

take a training set, form
hypotheses or models, and make predictions about the
future. Because the training set is finite and the future is
uncertain, learning theory usually does not yield absolute
guarantees of performance of the algorithms. Instead,
probabilistic bounds on the performance of machine
learning algorithms are quite common.

theory

computational learning theorists study the
time complexity and feasibility of learning.

In computational learning theory, a
computation is considered feasible if it can
be done in polynomial time.

theory

There are several different approaches to
computational learning theory, which are
often mathematically incompatible.

This incompatibility arises from

using different inference principles: principles
which tell you how to generalize from limited
data.

differing definitions of probability (frequency
probability, Bayesian probability).

theory

The different approaches include:

Probably approximately correct learning (PAC
learning), proposed by Leslie Valiant;

VC theory, proposed by Vladimir Vapnik;

Bayesian inference, arising from work first done by
Thomas Bayes.

Algorithmic learning theory, from the work of E. M.
Gold.

Computational learning theory has led to
practical algorithms. For example, PAC theory
inspired boosting

What is this for?

The PAC framework allowed accurate
mathematical analysis of learning.

Basic facts of PAC
learning

Probably approximately correct learning (PAC learning) is a
framework of learning that was proposed by Leslie Valiant in his
paper
A theory of the learnable
.

In this framework the learner gets samples that are classified
according to a function from a certain class. The aim of the learner is
to find an approximation of the function with high probability. We
demand the learner to be able to learn the concept given any
arbitrary

approximation ratio, probability of success or
distribution
of the samples.

How does negative selection fit in? We only deal with a very special
distribution of the samples: one class samples. Is it a PAC learning
algorithm?

“The intend of PAC model is that successful learning of
an unknown target concept should entail obtaining with
high probability, a hypothesis that is a good
approximation of it.”

We can consider this target concept as a unknown
function, e.g. f:{0,1}
n

{0,1}; the result to pursue is an
approximation of f, or a hypothesis as called here.

The purpose of the discussion of PAC is to decide
whether a algorithm to find the approximation (1) good
enough or not (2) feasible or not.

“If we wish to define a model of learning from (random)
samples, a crucial point is to formulate ‘correctly’ the
notion of success
.”
(quoted but corrected and highlighted)

To make the discussion simple, let us use
the simple setup f:{0,1}
n

{0,1}

Instance space {0,1}
n

Give probability distribution
D

defined on {0,1}
n

The error of a hypothesis
h
with respect to a fixed target
concept
c

is defined as

Where
D

denotes the symmetric difference.

Error(h) is the probability that h and c will disagree
according to
D
.

The hypothesis
h

is a good approximation of the
target concept
c
if

error(h)
is small. (Note that
depends on
D
).

Definition of

“PAC Learnability”

This definition is the center piece of PAC
learning model.

Defining when the concept class
C
is:

PAC learnable by the hypothesis space
H

Properly PAC learnable

PAC learnable

What is concept class
C
?

C
={C
n
}
n

1
, where C
n

is set of target concepts
over {0, 1}
n

What is hypothesis space
H
?

H
={H
n
}
n

1
, where H
n

is set of hypotheses over
{0, 1}
n

Definition of “PAC learnable by the hypothesis space
H
”:

The concept class
C
is PAC learnable by the hypothesis space
H if there exists a polynomial time algorithm A and a polynomial
p(,.,.) such that for all n
≥1, all target concepts c

C
n
, all
probability distribution D on the instance space {0,1}
n
, and all e
and d, where 0<
e
,
d
<1, if the algorithm A is given at least
p(n,1/
e
,1/
d
) independent random examples of c drawn according
to D, then with probability at least 1
-
d
, A returns a hypothesis h

H
n

with error(h)≤
e
.

Note: this talks about the existence of A, not what
exactly A is.

The smallest such polynomial p is called the sample
complexity of learning algorithm.

This is as essential to a learning algorithm as time complexity to
a general algorithm

Definition pf “properly PAC learnable”

If
C
=
H

Definition of “PAC learnable”

If
C

is concept class and there exists some
hypothesis space
H

such that hypotheses in
H

can be
evaluated on given instances in polynomial time and
such that
C

is PAC learnable by
H

This extension if from “for given H” to “existence of H”

If C is properly PAC learnable, it is obviously PAC learnable
(assuming hypotheses on C can be evaluated on give
instance in polynomial time)

There are many variants of the basic
definition.

It can be shown they are equivalent.

The model can be extended to various
aspects.

We ask for a single algorithm A for all
distribution

Not that for every distribution D there
exists an algorithm that was designed for
the specific distribution D

That means: algorithm A does not know
the distribution.

A key part of PAC learning and the potential link to
negative selection algorithm we’re try to make (if existing
at all): probability distribution D

“The error probability is measured with respect to the
same distribution according to which the random
examples are chosen.” “if the learning algorithm will get
random examples from a distribution which provides only
samples with first bits 0 and the error will be measured
with respect to distribution on strings whose first bit is 1
then clearly the learning algorithm has no chance to do
much.”

NSA, at least my method, seems doing something “no
chance to do much” described above, with a little help
from the magic “self threshold (or self radius)”

NSA’s notion of success is not well defined?

What does it mean by “L is a PAC learning
algorithm”:

For any given
d
,
e
>0, there is a sample size
m
0
, such that for all target functions t
computable and all probability distribution P,
we have m>=m
0

P
m
(error(L(s),t)>
e
)<
d

How does negative selection algorithm fit
into the model of PAC learning?

Does NSA count as a learning process or
algorithm at all?

references

D Haussler. Probably approximately correct
learning. In AAAI
-
90 Proceedings of the Eight
National Conference on Artificial Intelligence,
Boston, MA, pages 1101
--
1108. American
Association for Artificial Intelligence, 1990.
http://citeseer.ist.psu.edu/haussler90probably.ht
ml

http://en.wikipedia.org/wiki/Probably_approximat
ely_correct_learning

http://en.wikipedia.org/wiki/Computational_learni
ng_theory

... …