MACHINE LEARNING - CS671

Probably Approximately Correct Learning

Prof.Dan A.Simovici

UMB

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

1/17

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

2/17

PAC Learning and Eﬃcient PAC Learning

Let X

n

be the example space,where X

n

= {0,1}

n

or R

n

,C

n

be a concept

class C

n

⊆ X

n

,X =

n1

X

n

and C =

n1

C

n

.

C is

PAC learnable

if there exists a learning algorithm L such that

for every C ∈ C,

every probability distribution P on X

,

and every ,δ ∈ (0,1),

if L has access to the examples in in C,then L outputs a hypothesis

H ∈ C such that with probability at least 1 −δ we have err(H) .

C is

eﬃciently learnable

if L runs in time polynomial in n,

1

and

1

δ

when

learning a concept C ∈ C

n

.

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

3/17

n:the size of the examples;

:

the error parameter

;

δ:

the conﬁdence parameter

.

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

4/17

Algorithm 1.1:Learning Algorithm for a Conjunction

Data:a list of examples of the form (x

i

,b

i

)

Result:A conjunction of Variables in a set U

/* U is the initial set of variables */1

set U = {u

1

,

u

1

,...,u

n

,

u

n

};2

for i:= 1 to m do3

if b

i

= 1 then4

for j:= 1 to n do5

if (x

i

)

j

= 1 then6

delete

u

j

if present in U7

else8

delete u

j

if present in U9

end10

end11

end12

end13

return a conjunction of the literals of U;14

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

5/17

A Reformulation of Monomial Learning

Hypotheses are sequences of the form {0,1,∗}

n

.

An example (x

1

,...,x

n

) is

positive

for a hypothesis H = (h

1

,...,h

n

)

if x

i

= h

i

when h

i

∈ {0,1} and x

i

is arbitrary when h

i

= ∗ for every i,

1 i n.Otherwise,(x

1

,...,x

n

) is

negative

.

Example

x = (1,0,0,1,0,1,1) is a positive example for the hypothesis

H = (1,0,0,1,∗,1,1),which represents the monomial

u

1

∧

u

2

∧

u

3

∧ u

4

∧ u

6

∧ u

7

;

x is positive relative to any hypothesis H

obtained from H by replacing 1s

or 0s by ∗

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

6/17

Algorithm 1.2:Reformulated Learning Algorithm for Monomials

Data:a list of examples of the form (x

i

,b

i

)

Result:A sequence in Seq({0,1,∗}) representing a monomial

/* the components of the hypothesis are denoted by h

1

,...,h

n

*/set1

H = x

1

;

for i:= 2 to m do2

if b

i

= 1 then3

for j:= 1 to n do4

if (h

j

= 0 and (x

i

)

j

= 1) or (c

j

= 1 and (x

i

)

j

= 0) then5

replace h

j

by ∗6

end7

end8

end9

end10

return H;11

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

7/17

A Run of the Reformulated Algorithm

(x

i

,b

i

) H

((1,0,0,1,1,1,1),

1

) (1,0,0,1,1,1,1)

((1,1,0,1,1,1,1),

0

) (1,0,0,1,1,1,1)

((1,0,0,0,1,0,1),

0

) (1,0,0,1,1,1,1)

((1,0,0,1,0,1,1),

1

) (1,0,0,1,∗,1,1)

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

8/17

PAC Learnability of Monomial Learning

Let C = (c

1

,...,c

n

),where c

j

∈ {0,1,∗}.

if c

j

= ∗,then h

j

= c

j

;

H will never err on a negative example of C.

Example

The hypothesis H

H = (1,0,∗,1,1,∗,1)

is consistent with the concept C given by

C = (∗,0,∗,1,1,∗,∗).

However H

= (1,1,∗,1,1,∗,1) is in error relative to C.

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

9/17

A bit j is

bad

in a hypothesis H = (h

1

,...,h

n

) relative to a concept

C = (c

1

,...,c

n

) if c

j

= 1 and h

j

= 0 or if c

j

= 0 and h

j

= 1.

The

error rate of a hypothesis H = (h

1

,...,h

n

) relative to a concept C

is:

err(H) =

number of bad bits

n

.

Suppose that

p(j) = P((c

j

= 1 ∩ h

j

= 0) ∪ (c

j

= 0 ∩ h

j

= 1))

n

,

so,the probability that this bit is correct is

1 −p(j) 1 −

n

.

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

10/17

An Upper Bound on the Probability that a Bad Bit will

occur in H

The jth bit of a hypothesis is not replaced by ∗ after m steps if in each of

the m examples the bit is not in error.Therefore,the probability that the

bit is not replaced by ∗ after m steps is no more than

1 −

n

m

The probability that some bad bit is not replaced after m steps is at most

n

1 −

n

m

.

Thus,

n

1 −

n

m

δ.

Since 1 −t e

−t

it suﬃces to choose m such that

ne

−

m

n

δ,

so m

n

ln(n) +ln

1

δ

.

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

11/17

If the algorithm takes at least m examples,where

m

n

ln(n) +ln

1

δ

,

then with probability at least 1 −δ the resulting hypothesis H will have

error at most .

Running time is bound by mn,so time is bounded by a polynomial in n,

1

and

1

δ

.So the algorithm is PAC.

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

12/17

Learning Rectangles is PAC

Recall:

Algorithm 1.3:Learning Algorithm for Rectangles

Data:a list of examples of the form ((x

i

,y

i

),b

i

) for 1 i m

Result:A rectangle [p,u;q,v]

R = [p

1

,u

1

;q

1

,v

1

];1

i = 1;2

for i:= 1 to m do3

if b

i

= 1 then4

R = R [p

i

,u

i

;q

i

,v

i

]5

end6

end7

return R;8

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

13/17

Remarks:

the tightest-ﬁt rectangle R

is always contained in the target rectangle

R;

the diﬀerence R −R

is contained in four rectangular strips which

overlap in corners.

R

R

✟

✟

✟

✟✙

✟

✟

✟

✟✙

CONCEPT

HYPOTHESIS

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

14/17

When would R

misclassify an example?

Answer:when the example will fall in any of the four strips:

R

R

✟

✟

✟

✟✙

✟

✟

✟

✟✙

CONCEPT

HYPOTHESIS

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

15/17

if the probability of each strip is less than

4

the total area of the

strips is less than ,so the error of the hypothesis is less than ;

the probability that a single draw misses a strip is 1 −

4

,so the

probability that m draws all miss a strip is at most (1 −

4

)

m

;

the probability that any of the four strips of R −R

has weight

greater that

4

is at most 4(1 −

4

)

m

;in other words,the probability

that the error of R

is greater than should be less than δ,so choose

m to satisfy 4

1 −

4

m

δ;

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

16/17

with probability 1 −δ over m examples,the weight of the error region

is bound by ;

since 1 −x e

−x

it suﬃces to have 4e

−

m

4

δ;

we have m

4

ln

4

δ

,so the algorithm is

PAC

.

Prof.Dan A.Simovici (UMB)

MACHINE LEARNING - CS671 Probably Approximately Correct Learning

17/17

## Comments 0

Log in to post a comment