# MACHINE LEARNING - CS671 Probably Approximately Correct Learning

AI and Robotics

Nov 7, 2013 (7 years and 10 months ago)

377 views

MACHINE LEARNING - CS671
Probably Approximately Correct Learning
Prof.Dan A.Simovici
UMB
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
1/17
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
2/17
PAC Learning and Eﬃcient PAC Learning
Let X
n
be the example space,where X
n
= {0,1}
n
or R
n
,C
n
be a concept
class C
n
⊆ X
n
,X =
￿
n￿1
X
n
and C =
￿
n￿1
C
n
.
C is
PAC learnable
if there exists a learning algorithm L such that
for every C ∈ C,
every probability distribution P on X
,
and every ￿,δ ∈ (0,1),
if L has access to the examples in in C,then L outputs a hypothesis
H ∈ C such that with probability at least 1 −δ we have err(H) ￿ ￿.
C is
eﬃciently learnable
if L runs in time polynomial in n,
1
￿
and
1
δ
when
learning a concept C ∈ C
n
.
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
3/17
n:the size of the examples;
￿:
the error parameter
;
δ:
the conﬁdence parameter
.
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
4/17
Algorithm 1.1:Learning Algorithm for a Conjunction
Data:a list of examples of the form (x
i
,b
i
)
Result:A conjunction of Variables in a set U
/* U is the initial set of variables */1
set U = {u
1
,
u
1
,...,u
n
,
u
n
};2
for i:= 1 to m do3
if b
i
= 1 then4
for j:= 1 to n do5
if (x
i
)
j
= 1 then6
delete
u
j
if present in U7
else8
delete u
j
if present in U9
end10
end11
end12
end13
return a conjunction of the literals of U;14
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
5/17
A Reformulation of Monomial Learning
Hypotheses are sequences of the form {0,1,∗}
n
.
An example (x
1
,...,x
n
) is
positive
for a hypothesis H = (h
1
,...,h
n
)
if x
i
= h
i
when h
i
∈ {0,1} and x
i
is arbitrary when h
i
= ∗ for every i,
1 ￿ i ￿ n.Otherwise,(x
1
,...,x
n
) is
negative
.
Example
x = (1,0,0,1,0,1,1) is a positive example for the hypothesis
H = (1,0,0,1,∗,1,1),which represents the monomial
u
1

u
2

u
3
∧ u
4
∧ u
6
∧ u
7
;
x is positive relative to any hypothesis H
￿
obtained from H by replacing 1s
or 0s by ∗
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
6/17
Algorithm 1.2:Reformulated Learning Algorithm for Monomials
Data:a list of examples of the form (x
i
,b
i
)
Result:A sequence in Seq({0,1,∗}) representing a monomial
/* the components of the hypothesis are denoted by h
1
,...,h
n
*/set1
H = x
1
;
for i:= 2 to m do2
if b
i
= 1 then3
for j:= 1 to n do4
if (h
j
= 0 and (x
i
)
j
= 1) or (c
j
= 1 and (x
i
)
j
= 0) then5
replace h
j
by ∗6
end7
end8
end9
end10
return H;11
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
7/17
A Run of the Reformulated Algorithm
(x
i
,b
i
) H
((1,0,0,1,1,1,1),
1
) (1,0,0,1,1,1,1)
((1,1,0,1,1,1,1),
0
) (1,0,0,1,1,1,1)
((1,0,0,0,1,0,1),
0
) (1,0,0,1,1,1,1)
((1,0,0,1,0,1,1),
1
) (1,0,0,1,∗,1,1)
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
8/17
PAC Learnability of Monomial Learning
Let C = (c
1
,...,c
n
),where c
j
∈ {0,1,∗}.
if c
j
￿= ∗,then h
j
= c
j
;
H will never err on a negative example of C.
Example
The hypothesis H
H = (1,0,∗,1,1,∗,1)
is consistent with the concept C given by
C = (∗,0,∗,1,1,∗,∗).
However H
￿
= (1,1,∗,1,1,∗,1) is in error relative to C.
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
9/17
A bit j is
in a hypothesis H = (h
1
,...,h
n
) relative to a concept
C = (c
1
,...,c
n
) if c
j
= 1 and h
j
= 0 or if c
j
= 0 and h
j
= 1.
The
error rate of a hypothesis H = (h
1
,...,h
n
) relative to a concept C
is:
err(H) =
n
.
Suppose that
p(j) = P((c
j
= 1 ∩ h
j
= 0) ∪ (c
j
= 0 ∩ h
j
= 1)) ￿
￿
n
,
so,the probability that this bit is correct is
1 −p(j) ￿ 1 −
￿
n
.
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
10/17
An Upper Bound on the Probability that a Bad Bit will
occur in H
The jth bit of a hypothesis is not replaced by ∗ after m steps if in each of
the m examples the bit is not in error.Therefore,the probability that the
bit is not replaced by ∗ after m steps is no more than
￿
1 −
￿
n
￿
m
The probability that some bad bit is not replaced after m steps is at most
n
￿
1 −
￿
n
￿
m
.
Thus,
n
￿
1 −
￿
n
￿
m
￿ δ.
Since 1 −t ￿ e
−t
it suﬃces to choose m such that
ne

m￿
n
￿ δ,
so m ￿
￿
n
￿
￿ ￿
ln(n) +ln
1
δ
￿
.
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
11/17
If the algorithm takes at least m examples,where
m ￿
￿
n
￿
￿
￿
ln(n) +ln
1
δ
￿
,
then with probability at least 1 −δ the resulting hypothesis H will have
error at most ￿.
Running time is bound by mn,so time is bounded by a polynomial in n,
1
￿
and
1
δ
.So the algorithm is PAC.
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
12/17
Learning Rectangles is PAC
Recall:
Algorithm 1.3:Learning Algorithm for Rectangles
Data:a list of examples of the form ((x
i
,y
i
),b
i
) for 1 ￿ i ￿ m
Result:A rectangle [p,u;q,v]
R = [p
1
,u
1
;q
1
,v
1
];1
i = 1;2
for i:= 1 to m do3
if b
i
= 1 then4
R = R ￿ [p
i
,u
i
;q
i
,v
i
]5
end6
end7
return R;8
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
13/17
Remarks:
the tightest-ﬁt rectangle R
￿
is always contained in the target rectangle
R;
the diﬀerence R −R
￿
is contained in four rectangular strips which
overlap in corners.
R
R
￿

✟✙

✟✙
CONCEPT
HYPOTHESIS
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
14/17
When would R
￿
misclassify an example?
Answer:when the example will fall in any of the four strips:
R
R
￿

✟✙

✟✙
CONCEPT
HYPOTHESIS
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
15/17
if the probability of each strip is less than
￿
4
the total area of the
strips is less than ￿,so the error of the hypothesis is less than ￿;
the probability that a single draw misses a strip is 1 −
￿
4
,so the
probability that m draws all miss a strip is at most (1 −
￿
4
)
m
;
the probability that any of the four strips of R −R
￿
has weight
greater that
￿
4
is at most 4(1 −
￿
4
)
m
;in other words,the probability
that the error of R
￿
is greater than ￿ should be less than δ,so choose
m to satisfy 4
￿
1 −
￿
4
￿
m
￿ δ;
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
16/17
with probability 1 −δ over m examples,the weight of the error region
is bound by ￿;
since 1 −x ￿ e
−x
it suﬃces to have 4e

￿m
4
￿ δ;
we have m ￿
4
￿
ln
4
δ
,so the algorithm is
PAC
.
Prof.Dan A.Simovici (UMB)
MACHINE LEARNING - CS671 Probably Approximately Correct Learning
17/17