A Sucient Condition for Polynomial

Distribution-Dependent Learnability

Martin Anthony

Department of Mathematics

London School of Economics

Houghton Street

London WC2A 2AE,UK

m.anthony@lse.ac.uk

John Shawe-Taylor

Department of Computer Science

Royal Holloway,University of London

Egham Hill

Egham

Surrey TW20 0EX,UK

john@dcs.rhbnc.ac.uk.

Abstract

We investigate upper bounds on the sample-size sucient for`solid'learn-

ability with respect to a probability distribution.We obtain a sucient condi-

tion for feasible (polynomially bounded) sample-size bounds for distribution-

specic (solid) learnability.

1

1 Introduction

There have been extensive studies of probabilitic models of machine learning;see

the books [3,11,12],for example.In the standard`PAC'model of learning,the

denition of successful learning is`distribution-free'.A number of researchers have

examined learning where the probability distribution generating the examples is

known;see [6,5],for example.In this paper we seek conditions under which such

distribution-specic learning can be achieved with a feasible (polynomial) number

of training examples.

2 The PAC learning framework

In this section,we describe a probabilistic model of learning,introduced by Valiant [15]

and developed by many researchers (see for example [8]).It has come to be known

as the probably approximately correct learning model [1].

Throughout,we have an example space X,which is either countable or is the Eu-

clidean space R

n

for some n.We have a probability space (X;;) dened on

X,where we assume that when X is countable, is the set of all subsets of X

and that when X is R

n

, is the Borel -algebra.A hypothesis is a -measurable

f0;1g-valued function on X.The hypothesis space H is a set of hypotheses,and the

target,c,is one particular concept fromH.A labelled example of c is an ordered pair

(x;c(x)).If c(x) = 1,we say x is a positive example of c,while if c(x) = 0,we say x

is a negative example of c.A sample y of c of length (or size) m is a sequence of m

labelled examples of c.When the target concept is clear,we will denote the sample

simply by the vector x 2 X

m

,so that if x = (x

1

;:::;x

m

) then the corresponding

sample of c is ((x

1

;a

1

);:::;(x

m

;a

m

)),where a

i

= c(x

i

).The learning problem is to

nd a good approximation to c from H,this approximation being based solely on a

sample of c,each example in the sample being chosen independently and at random,

according to the distribution .

Fix a particular target c 2 H.For any hypothesis h of H,the error of h (with respect

to c) is er

(h) = (hc);where hc is the set fx:h(x) 6= c(x)g,the symmetric

dierence of h and c.We say that a hypothesis h is -close to c if er

(h) .For

any set F of measurable subsets of X,we dene the haziness of F (with respect to

c) as

haz

(F) = supfer

(h):h 2 Fg:

2

The set H[x;c] of hypotheses consistent with c on x is

H[x;c] = fh 2 H:h(x

i

) = c(x

i

) (1 i m)g;

which we shall usually denote by H[x] when c is understood.Now we can dene

what is meant by solid learnability.(This terminology comes from [5].)

Denition 2.1 The hypothesis space H is solidly learnable if,for any , 2 (0;1),

there is m

0

= m

0

(;) such that given any c 2 H,for all probability measures on

X,

m> m

0

=)

m

fx 2 X

m

:haz

(H[x]) < g > 1 :

Here,

m

is the product measure on X.

In words,H is solidly learnable if for a given accuracy parameter and a given

certainty parameter ,there is a sample size,independent of the distribution and

the target concept,such that any hypothesis consistent with that many random

examples will\probably"be\approximately"correct.(In this case,a learning

algorithm which returns a consistent hypothesis will perform well.) From now on,

`learnability'shall mean`solid learnability'.

We assume throughout that the spaces satisfy certain measurability requirements|

namely,that they are universally separable,so that the probabilities in the denitions

and proofs are indeed dened.See [13,8] for details.

3 Distribution-independent sample sizes

The Vapnik-Chervonenkis dimension (or VC dimension) [16] has been widely used

in order to obtain some measure of the degree of expressibility of a hypothesis space,

and hence to obtain learnability results [9,8,4].Given a hypothesis space H,dene,

for each x = (x

1

;:::x

m

) 2 X

m

,a function x

:H!f0;1g

m

by

x

(h) = (h(x

1

);:::;h(x

m

)):

The growth function,

H

from the set of integers to itself is dened by

H

(m) = maxfjfx

(h):h 2 Hgj:x 2 X

m

g 2

m

:

3

If jfx

(h):h 2 Hgj = 2

m

then we say that x is shattered by H.If

H

(m) = 2

m

for all

mthen the Vapnik-Chervonenkis dimension of H is innite.Otherwise,the Vapnik-

Chervonenkis dimension is the largest positive integer m for which

H

(m) = 2

m

;

that is,the largest integer m such that some sample x of length m is shattered.We

remark that any nite hypothesis space certainly has nite VC dimension.

It can be shown that if VCdim(H)=d,and m d 1 then

H

(m) (em=d)

d

[14].

This is useful in obtaining bounds on the sucient sample size m

0

(;).Follow-

ing [10],it can be proved [8] that if the hypothesis space H has nite VC dimension

d,then H is learnable.Further,if H is learnable then H must have nite VC di-

mension [8].Specically,the suciency result of Blumer et al.follows from the

following,which is a renement of a result from [16].

Theorem 3.1 (Blumer et al.[8]) For any distribution ,

m

fx 2 X

m

:haz

(H[x]) > g < 2

H

(2m) 2

m=2

:

This bound has been tightened [4],resulting in the following bound on sucient

sample-size.

Theorem 3.2 ([4]) The hypothesis space H is learnable if H has nite VC dimen-

sion.If d = VCdim(H) > 1 is nite then a suitable m

0

is

m

0

= m

0

(;) =

&

1(1

p)

ln

d=(d 1)

!

+2dln

6

!'

;

where ln denotes natural logarithm.

4 Distribution-dependent learning

Recall the denition of learnability of a hypothesis space H.H is learnable if for any

accuracy parameter ,any condence parameter ,any target concept c 2 H and

any probability measure on X,there is a sample-size m

0

,which is a function of

and alone,such that the following holds:With probability at least 1 ,if some

hypothesis h is consistent with c on at least m

0

inputs chosen randomly according

to the distribution ,then h has actual error less than .As emphasised earlier,

the value of m

0

must depend on neither the target concept c nor the distribution

4

(probability measure) .In many realistic learning problems,the distribution on the

input space is xed but unknown.This is the primary reason for proving learnability

results and nding sucient sample-sizes which are independent of the distribution;

results that are independent of the distribution certainly hold for any particular

distribution.If something is known of the distribution or if the distribution is of a

special type,it may be possible to say more,obtaining positive results even when

the hypothesis space has innite VC dimension.

In order to introduce distribution-dependent learnability,we may dene learnability

of a particular concept c from a hypothesis space H,with respect to a particular

probability measure on the input space X.We say that c is -learnable in H

if given any ; 2 (0;1),there is an integer m

0

= m

0

(;;c;) such that for all

m m

0

,

m

fx 2 X

m

:haz

(H[x;c]) > g < :In addition,we say that H itself is

-learnable if every c 2 H is -learnable and if there is a sucient sample-size m

0

which is independent of the hypothesis c.If H is -learnable for every distribution

on X,then we say that H is distribution-dependent learnable,abbreviated as

dd-learnable.

If one examines closely the proof in [8] of Theorem 2.1 then it is clear that the term

H

(2m) in the bound can be replaced by the expectation over X

2m

of the function

H

,where

H

(x) = jfx

(h):h 2 Hgj.(This will be a randomvariable if we assume

that H is universally separable;see [2]).Thus,for distribution-dependent analysis,

we can use E

2m

(

H

(x)) in place of

H

(m),where E

2m

(:) denotes expected value

with respect to

2m

and over X

2m

.This yields

m

fx 2 X

m

:haz

(H[x]) > g < 2 E

2m

(

H

(x)) 2

m=2

;

for m 8=.

A function f is said to be subexponential if,for all > 0,as x tends to innity,

f(x) exp(x) tends to zero.With this denition,we have the following.

Theorem 4.1 Let be any probability measure on X.If E

n

(

H

(x)),the expected

value of

H

(x) over X

n

(with respect to

n

),is a subexponential function of n,then

H is -learnable.

Proof:For m 8=,

m

fx 2 X

m

:haz

(H[x]) > g < 2 E

2m

H

(x)2

m=2

:If

E

2m

H

(x) 2

m=2

!0 as m!1

for all > 0,which is the case if E

n

(

H

(x)) is a subexponential function of n,then

the quantity on the right-hand side can be made less than any > 0 by choosing

5

m m

0

,where m

0

depends only on and not on the hypothesis c.The result

follows.ut

It's fairly easy to see that demanding that E

n

(

H

(x)) be sub-exponential is equiva-

lent to demanding that n

1

log E

n

(

H

(x))!0 as n!1.In fact,results of Vapnik

and Chervonenkis [16] show that the weaker condition n

1

E

n

(log

H

(x))!0 as

n!1is sucient.

We give two examples of this theorem | one discrete and the other continuous.

Example 1:Let fB

i

g

i1

be any sequence of disjoint sets such that jB

i

j = i,(i 1)

and take as example space the countably innite set X =

S

1

i=1

B

i

:Let the probability

measure be dened on the -algebra of all subsets of X by

(fxg) =

1i

12

i

(x 2 B

i

):

Let the hypothesis space H be the set of functions H =

S

1

i=1

fI

C

:C B

i

g;where

I

C

:X!f0;1g is the characteristic function of the subset C.Then it is easy to

see that H has innite VC dimension and thus is not learnable.However,we can

use Theorem 3.1 to prove that H is -learnable.For x 2 X

n

,let I(x) be the set of

entries of x.That is,I(x) = fx

i

:1 i ng:Then it is not dicult to see that

H

(x) =

X

2

jI(x)\B

i

j

;

where the sum is over all i such that I(x)\B

i

6=;:Therefore,

I(x) S

k

=

k

[

i=1

B

i

=)

H

(x) 2 +2

2

+:::+2

k

< 2

k+1

:

Further,

H

(x) 2

n

for all x 2 X

n

.

Let

k

be the probability that I(x) S

k

;that is,

k

=

n

(S

n

k

).Then,

k

= ((S

k

))

n

=

1

12

k

n

:

For any 0 < x < 1,(1 x)

n

1 nx and so,for k 2,

k

k1

1

1

1 2

k1

n

n2

k1

:

Since the sets S

n

k

cover X

n

,we therefore have

E

n

(

H

(x)) < 2

1

+

n1

X

k=2

(

k

k1

) 2

k+1

+2

n

1

n

S

n

n1

6

1 +

n1

X

k=2

n2

k1

2

k+1

+2

n

n2

n1

= 1 +4n(n 2) +2n < 4n

2

:

It follows that the expected value of

H

(x) is polynomial and therefore H is -

learnable.

Example 2:Let X be the set of non-negative reals and let the distribution have

probability density function p(x) = e

x

,so that ([0;y]) = 1 e

y

:Let the hy-

pothesis space H consist of all (characteristic functions of) nite unions of closed

intervals,at most k of which intersect the interval [0;k

2

] for each positive integer

k.Thus,for example,[1;2]

S

[3;5] is in H,but [0;1]

S

[2;3]

S

[3;5]

S

[7;9]

S

[17;18]

is not,since four of the intervals in this union intersect the interval [0;3

2

].Let us

denote the interval [0;k

2

] by S

k

.Then (S

k

) = 1 e

k

2

and (see [8]) HjS

k

has VC

dimension 2k.If x 2 S

n

k

then

H

(x) n

2k+1

,by a crude form of Sauer's result.In

any case,

H

(x) 2

n

and it follows that

E

n

(

H

(x))

n

X

k=1

n

2k+1

(

n

(S

k

)

n

(S

k1

)) +2

n

(1

n

(S

n

n

))

<

n

X

k=1

n

2k+1

1 (1 e

(k1)

2

)

n

+2

n

1 (1 e

n

2

)

n

<

n

X

k=1

n

2k+2

e

(k1)

2

+2

n

ne

n

2

:

The second quantity tends to 0.Further,n

2x+2

e

(x1)

2

n

4

exp((lnn)

2

),as can

easily be checked by calculus,so that

n

X

k=1

n

2k+2

e

(k1)

2

n

5

exp

(lnn)

2

;

which is sub-exponential.It follows that H is -learnable.

5 Polynomial learnability

Suppose that H is -learnable.For learning to be ecient in any sense,we certainly

need a sample-size bound which,as well as being independent of c,does not increase

too dramatically as and decrease (and the learning task becomes,consequently,

more dicult).It is appropriate to demand that,for eciency,the sample-size

(and hence running time of any ecient learning algorithm) be polynomial in 1=.

Furthermore,since if one doubles the size of a sample,then one would expect to

square the probability that a bad hypothesis is consistent with the sample,we require

7

the sample-size to vary polynomially in ln(1=).We therefore make the following

denition:

Denition 5.1 Hypothesis space H is polynomially -learnable if for any ; in

(0;1),there is m

0

= m

0

(;),polynomial in 1= and ln(1=),such that,given any

c 2 H,

m m

0

=)

m

fx 2 X

m

:haz

(H[x]) < g > 1 :

We have observed that if the expectation of

H

(x) is subexponential then H is

-learnable.We have the following result.

Theorem 5.2 Suppose H is a hypothesis space on X and is a distribution on

X.If there is 0 < < 1 such that (for large n),log E

n

(

H

(x)) < n

1

then H is

polynomially -learnable.

Proof:Let n = 2

1

(4=)

1=

log(2=),where log denotes binary logarith,and sup-

pose that < 1=4.Then n (4=) log(2=) and so n=4 log(2=).But,also,

n 2

1

(4=)

1=

and hence n=4 (2n)

1

.It follows that

n2

log

2

+(2n)

1

> log

2

+log E

2n

(

H

(x));

and so

2 E

2n

(

H

(x))2

n=2

< :

The value of n is polynomial in 1= and ln(1=),so H is polynomially -learnable.

ut

The above result is essentially the best that can be obtained by using the bound

n

fx 2 X

n

:haz

(H[x]) > g < 2 E

2n

H

(x)2

n=2

;

since if the condition of the theorem is not satised (for example,if the expectation

is of order 2

n=log n

),the resulting sample-size bound will be exponential.

Bertoni et al.[7] studied the question of polynomial sample complexity for distribution-

dependent learning.For x = (x

1

;:::;x

m

) 2 X

m

,let C

m

(x) be the size of the largest

subset of fx

1

;:::;x

m

g shattered by H.The,following on from the work of Vapnik

8

and Chervonenkis,Bertoni et al.showed that if there is a positive constant such

that

E

n

C

m

(x)m

!

= O(m

);

then H is polynomially -learnable.

We now take a dierent approach,extending work of Ben-David et al.[5] to deter-

mine a sucient condition for H to be polynomially -learnable.In [5],the following

denition was made.

Denition 5.3 A hypothesis space H over an input space X is said to have X-

nite dimension if X =

S

1

i=1

B

i

where the restriction HjB

i

of H to domain B

i

has

nite VC dimension,for each i.

Ben-David et al.[5] proved that if a hypothesis space H has X-nite dimension

then H is dd-learnable.The spaces in the examples of the previous section are

easily seen to have X-nite dimension and hence are dd-learnable;that is,they

are -learnable for all probability distributions (and not just for the particular

distributions discussed).(Indeed,if X is countable then any hypothesis space on

X has X-nite dimension,and the rst example is a special case of this.) It

follows also that the notion of dd-learnability is not a vacuous one,since these same

hypothesis spaces are dd-learnable but,being of innite VC dimension,are not

learnable.

It is straightforward to give an example of a hypothesis space H over a (necessarily)

uncountable input space X such that H does not have X-nite dimension.Take

X to be the closed interval X = [0;1];and let H be the space of all (characteris-

tic functions of) nite unions of closed subintervals of X.Now,for any Y X,

VCdim(HjY ) k if and only if jY j k.It follows that if X were the countable

union X =

S

1

i=1

B

i

of sets B

i

such that H had nite VC dimension on B

i

then,

in particular,each B

i

would be nite and X,as the countable union of nite sets,

would be countable.However,X is uncountable and we therefore deduce that H

does not have X-nite VC dimension.

The result of Ben-David et al.provides a positive distribution-dependent learnability

result.However,it does not address the size of sample required for learnability to

given degrees of accuracy and condence.A closer analysis of the proof of this result

in [5] shows that the resulting sucient sample-size will not be polynomial in 1=

and log(1=) for many distributions.To introduce the approach taken here,we rst

9

have the following result,in which to say that a sequence fS

k

g

1

k=1

of subsets of X

is increasing means that S

1

S

2

S

3

::::

Proposition 5.4 H has X-nite dimension if and only if there exists an increas-

ing sequence fS

k

g

1

k=1

of subsets of X such that

S

1

k=1

S

k

= X and VCdim(HjS

k

) k:

Proof:Suppose that H has X-nite dimension,and let the sets B

i

be as in the

denition.Let x

0

2 B

1

and set B

0

= fx

0

g.For k 1 let S

k

=

S

m(k)

i=0

B

i

;where

m(k) is the maximum integer m such that the restriction of H to

S

m

i=0

B

i

has VC

dimension at most k.Given any x 2 X,there is an m such that x 2

S

m

i=0

B

i

.

Suppose that H restricted to

S

m

i=0

B

i

has VC dimension k.Then m(k) m,so

x 2 S

k

.Conversely,if such sets S

i

exist,take B

i

= S

i

.Then VCdim(HjB

i

) is nite,

and

S

1

i=1

B

i

= X.ut

If H\nearly"has nite VC dimension,in some sense,we might hope to get polyno-

mially bounded sample-sizes.Motivated by the above result,we make the following

denition.

Denition 5.5 Hypothesis space H has polynomial X-nite dimension with re-

spect to if X =

S

1

k=1

S

k

where fS

k

g

1

k=1

is increasing,VCdim(HjS

k

) k,and

1 (S

k

) = O

1k

c

for some constant c > 0.

Benedek and Itai [6] have gone some way towards investigating sucient sample-

sizes for distribution-dependent learnability in the case of discrete distributions (that

is,distributions nonzero on only countably many elements of the example space).

With the denition of polynomial X-nite dimension,we can develop a theory for

both continuous and discrete distributions.We have the following result,which we

prove by a method similar to that used in [5].

Theorem 5.6 Let H be a hypothesis space over X,and a probability measure

dened on X.If H has polynomial X-nite dimension with respect to ,then H

is dd-learnable and polynomially -learnable.

10

Proof:Suppose that H has polynomial X-nite dimension with respect to .

Suppose that 0 < < 1=4 and S X is such that (S) 1 =2.The probability

(with respect to

m

) that a sample of length m= 2l,chosen according to ,has at

least half of its members in S is at least 1

P

l

k=0

2l

k

2

2lk

1

2

k

:Now,

l

X

k=0

2l

k

!

2

2lk

1

2

k

l

X

k=0

2l

k

!

2

2lk

l

2

l

l

X

k=0

2l

k

!

=

l

2

l1

:

Therefore,this probability is at least 1

l

2

l1

:If l l

0

= log(1=) (where log

denotes logarithm to base 2) then

l(log +1) log

1

(log +1) = log

log

1

1

< log

and this implies that the above probability is greater than 1 =2.(Note that we

have used the fact that,since < 1=4,log +1 is negative.)

Let k() = minfk:(S

k

) 1 =2g:The above shows that,with probability at

least 1 =2,a random sample of length m 2l

0

has at least half of its members

in S = S

k()

.Let

m

= 2

&

2

p 2(

p2

p)

ln

2d=(d 1)

!

+2k() ln

12

!'

:

Suppose c 2 H is the target concept.Since HjS has VC dimension at most k(),

m

is,by Theorem 2.2,twice a sucient sample size for the learnability of HjS

with accuracy =2 and condence 1 =2.Let m m

,and let l = bm=2c l

0

.

If x 2 X

m

is such that x has at least l of its entries from S = S

k()

,then we shall

denote by x

S

the unique vector of length l whose entries are precisely the rst l

entries of x from S,appearing in the same order as in x.Let

1

be the probability

measure induced on S by .Thus,for any measurable subset A of X,

1

(A\S) =

(A) (S)

:

Observe that if h 2 H[x] and er

(h) > then,since (S) 1 =2;the function

hjS (h restricted to S) is such that hjS 2 (HjS)[x

S

] and

er

1

(hjS) =

1 (S)

(fx 2 X:h(x) 6= c(x)g\S) >

1(S)

2

>

2

:

Therefore,denoting the number of entries of a vector x which lie in S by s(x),we

have

m

fx 2 X

m

:haz

(H[x]) > g

11

=

m

fx:haz

(H[x]) > ;s(x) lg +

m

fx:haz

(H[x]) > ;s(x) < lg:

The second measure here is at most =2 since with probability at least 1=2,s(x)

is at least l.Further,

m

fx 2 X

m

:haz

(H[x]) > and s(x) lg

=

m

fx 2 X

m

:haz

(H[x]) > js(x) lg

m

fx 2 X

m

:s(x) lg

m

fx 2 X

m

:9h 2 H[x] with er

(h) > js(x) lg

m

fx 2 X

m

:9f 2 (HjS)[x

S

] with er

1

(f) > =2g;

where,for any events A and B,

m

(AjB) is the conditional probability (with respect

to

m

) of A given B.Now,if s(x) l and x is -randomly chosen,then x

S

is a

1

-randomly chosen sample of length l.Therefore this last measure is at most =2,

since l is a sucient sample-size for the learnability of HjS to accuracy =2 with

condence =2.

Note that the preceeding analysis,since it holds true for any distribution ,shows

that H is dd-learnable.Now,since H has polynomial X-nite dimension with

respect to ,there are c;R > 0 such that 1 (S

k

) R=k

c

,so that

k()

&

2R

1=c

'

;

which is polynomial in 1=.Therefore m

is a sucient sample-size which is poly-

nomial in 1= and in ln(1=),and hence H is polynomially -learnable.ut

To illustrate the idea of polynomial X-nite dimension,consider again the examples

of the previous section.For the rst example,we see that the space has polynomial

X-nite dimension by taking S

k

to be the union of the sets B

1

through to B

k

.The

sequence fS

k

g

1

k=1

is increasing and

S

1

k=1

S

k

= X:Further,if x 2 S

m

k

is shattered,

the entries of x must lie entirely within one of the B

i

(1 i k) and hence

VCdim(HjS

k

) = maxfVCdim(HjB

j

):j kg = VCdim(HjB

k

) = k:

Now,1 (S

k

) = 1=2

k

;so H has polynomial X-nite dimension with respect to

and H is polynomially -learnable.

For the second example,let S

k

= [0;k

2

].Then fS

k

g

1

k=1

is an increasing sequence with

union X and VCdim(HjS

k

) = 2k.(Clearly,the factor 2 here is of no consequence.)

Further,1(S

k

) = e

k

2

and so H has polynomial X-nite dimension with respect

to .

It remains to give an example of a hypothesis space H over an input space X,

together with a probability distribution on X,such that H has X-nite dimension

12

but does not have polynomial X-nite dimension with respect to .To this end,

let X be the set of all positive integers and H the set of all (characteristic functions

of) subsets of X.The input space is countable,and therefore H has X-nite

dimension.Dene the probability measure on X by

(fxg) =

1log(x +1)

1log(x +2)

:

Suppose that the sequence of sets fS

k

g

1

k=1

is such that

X =

1

[

k=1

S

k

and VCdim (HjS

k

) k:

Clearly,VCdim(HjS

k

) = jS

k

j:But H restricted to S

k

is supposed to have VC

dimension at most k.Therefore,for each integer k,S

k

has cardinality at most k.It

follows that

(S

k

) (f1;2;:::;kg) = 1

1log(k +2)

;

and 1(S

k

) 1= log(k+2).Thus,H does not have polynomial X-nite dimension

with respect to .In fact,one can show directly that H is not polynomially -

learnable.For suppose that the target is the identically-0 function and that a sample

x of size m is given.There is a hypothesis consistent with the target on x and with

error at least unless (fx

i

:1 i mg) > 1 :We therefore need to have

1 < (fx

i

:1 i mg) 1

1log(m+2)

;

so that m e

1=

2;which is exponential in 1=.

References

[1] Dana Angluin,Queries and concept learning,Machine Learning,2(4),1988:

319{342.

[2] Martin Anthony,Uniform Convergence and Learnability,PhD thesis,Univer-

sity of London 1991.

[3] Martin Anthony and Norman Biggs,Computational Learning Theory:An

Introduction,Cambridge University Press:Cambridge,UK,1992.

[4] Martin Anthony,Norman Biggs and John Shawe-Taylor,The learnability of

formal concepts,Proceedings of the Third Workshop on Computational Learn-

ing Theory,Morgan Kaufman,San Mateo,CA,1990.

13

[5] Shai Ben-David,Gyora M.Benedek and Yishay Mansour,A parameterization

scheme for classifying models of learnability,Proceedings of the Second Work-

shop on Computational Learning Theory,Morgan Kaufman,San Mateo,CA,

1989.

[6] Gyora M.Benedek and Alon Itai,Learnability with respect to xed distribu-

tions,to appear,Theoretical Computer Science.

[7] A.Bertoni,P.Campadelli,A.Morpurgo,and S.Panizza,Polynomial uniform

convergence and polynomial-sample learnability,In Proceedings 5th Annual

Workshop on Computational Learning Theory,pages 265{271.ACM Press,

New York,NY,1992.

[8] AnselmBlumer,Andrzej Ehrenfeucht,David Haussler and Manfred Warmuth,

Learnability and the Vapnik-Chervonenkis dimension,Journal of the ACM,

36(4),1989:929{965.

[9] David Haussler,Quantifying inductive bias:AI learning algorithms and

Valiant's learning framework,Articial Intelligence,36,1988:177-221.

[10] David Haussler and Emo Welzl,-nets and simplex range queries,Discrete

Comp.Geom.,2,1987:127-151.

[11] Michael J.Kearns and Umesh Vazirani (1995).Introduction to Computational

Learning Theory,MIT Press 1995.

[12] Balas K.Natarajan,Machine Learning:A Theoretical Approach,Morgan

Kaufmann,San Mateo,California,1991.

[13] David Pollard,Convergence of Stochastic Processes,Springer-Verlag,1984.

[14] N.Sauer,On the density of families of sets,J.Comb.Theory (A),13,1972:

145{147.

[15] Leslie G.Valiant,A theory of the learnable.Communications of the ACM,

27(11),1984:1134{1142.

[16] V.N.Vapnik and A.Ya.Chervonenkis,On the uniform convergence of rela-

tive frequencies of events to their probabilities.Theory of Probability and its

Applications,16(2),1971:264-280.

14

## Σχόλια 0

Συνδεθείτε για να κοινοποιήσετε σχόλιο