l15 - CIM

strawberrycokevilleΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

54 εμφανίσεις

CS
-
424 Gregory Dudek

Lecture 14



Learning


Probably approximately correct learning (cont’d)


Version spaces


Decision trees


CS
-
424 Gregory Dudek

PAC: definition

Relax this requirement by not requiring that the learning
program necessarily achieve a small error but only that it to
keep the error small
with high probability
.


Probably approximately correct (PAC) with
probability


and error at most


if, given any set of
training examples drawn according to the fixed
distribution, the program outputs a hypothesis f such
that



Pr(Error(f) >

) <



CS
-
424 Gregory Dudek

PAC Training examples


Theorem:


If the number of hypotheses |H| is finite, then a
program that returns an hypothesis that is consistent
with

m = ln(


/|H|)/ln(1
-


)

training examples (drawn according to Pr) is
guaranteed to be PAC with probability


and error
bounded by

.

CS
-
424 Gregory Dudek

We want….


PAC (so far) describes accuracy of the hypothesis,
and the chances of finding such a concept.



How may examples do we need to rule out the “really
bad” hypotheses.



We also want the process to proceed
quickly.

CS
-
424 Gregory Dudek


PAC learnable spaces

A class of concepts C is said to be

PAC learnable for a hypothesis space H


if there exists an polynomial time algorithm
A


such that:

for any
c


C
, distribution Pr,


, and


,

if
A

is given a
quantity

of training examples
polynomial in 1/


and 1/

,


then with probability 1
-





the algorithm will return a hypothesis f from H such
that

error(f) <=


.


CS
-
424 Gregory Dudek

Observations on PAC


PAC learnability doesn’t tell us
how

to find the
learning algorithm.



The number of examples needed grows slowly as the
concept space increases, and with other key
parameters.



CS
-
424 Gregory Dudek

Example


Target and learned concepts are conjunctions with
up to
n

predicates. (This is our bias.)


Each predicate might be appear in either positive or
negated form, or be absent: 3 options.


This gives 3
n

possible conjunctions in the hypothesis
space.



CS
-
424 Gregory Dudek

Result


I have such a formula in mind.


I’ll give you some examples.


You try to guess what the formula is.


A concept that matches all our examples will be PAC
if m is at least

n/


ln ( 3/


)


CS
-
424 Gregory Dudek

How


How can we actually find a suitable concept?



One
key

approach: start with the examples
themselves, and try to generalize.



E.g. Given f(3,5) and f(5,5).


We might try replacing the first argument with a variable
X: f(X,5).


We might try replacing
both

arguments with variables:
f(X,Y).


We want to get as general as possible, but not too general.


The converse of this generalization is specialization.

CS
-
424 Gregory Dudek

Version Space
[RN 18.5;DAA 5.3]


Deals with
conjunctive concepts.



Consider a concept
C

as being identified with the
set of positive examples it associated with.


C:even numbered hearts = {3

,5

,7

,9

}.



A concept
C
1

is a
specialization

of concept C
2

if
the examples associated with
C
1

are a subset of
those associated with
C
2

.



3
-
of
-
hearts

more specialized than
odd hearts
.

CS
-
424 Gregory Dudek

Specialization/Generalization

Cards

Black

Red

Odd red

Even
red



Even
-



(
red

is implied)

3


5


7


9


CS
-
424 Gregory Dudek

Immediate


Immediate Specialization
: no intermediate.



Red

is
not

the immediate specialization of 2
-
of
-
hearts.



Red
is

the immediate specialization of hearts and
diamonds.


Note: This observation depends on knowing the
hypothesis space restriction.

CS
-
424 Gregory Dudek

Algorithm outline


Incrementally process training data.


Keep list of
most

and
least

specific concepts
consistent with the observed data.


For two concepts
A

and
B

that are consistent with the
data,the concept
C=

(
A AND B)

will also be consistent
yet more specific.



Tied in a subtle way to
conjunctions.


Disjunctive concepts can be obtained trivially by joining
examples, but they’re not interesting.

CS
-
424 Gregory Dudek


4


:no


5


:yes


5


:no


7


:yes


9


---


3


yes

VS example

Cards

Black

Red

Even red

Odd red



Odd
-



(red is implied)

3


5


7


9


CS
-
424 Gregory Dudek

Algorithm specifics


Maintain two bounding concepts:


The most specialized (
specific boundary, S
-
set)


The broadest (
general boundary, G
-
set).



Each example we see is either positive (yes) or negative (no).


Positive examples (+) tend to make the concept more general
(or inclusive). Negative examples (
-
) are used to make the
concept more exclusive (to reject them).


+
-
> move “up” the specific boundary

-

-
> move “down” the general boundary.

Detailed algorithm: RN p. 549; DAA p. 191
-
192.


CS
-
424 Gregory Dudek

Observations


It allows you to
GENERALIZE

from a training set
to examples never
-
before
-
seen !!!


In contrast, consider table lookup or rote learning.



Why is that good?

1 It allows you to infer things about new data (the whole
point of learning)

2 It allows you to (potentially) remember old data much
more efficiently.


Version space method is optimal for a conjunction
of positive literals.


How does it perform with noisy data?


CS
-
424 Gregory Dudek


CS
-
424 Gregory Dudek

Restaurant Selector

Example attributes:


1. Alternate


2. Bar


3. Fri/Sat


4. Hungry


5. Patrons


6. Price


etc.

Forall restaurants r : Patrons(r, Full) AND
WaitEstimate(r,under_10_min) AND Hungry(r,N)
-
> WillWait(r)

CS
-
424 Gregory Dudek

Example 2

Maybe we should have made a reservation?

(using a decision tree)



Restaurant lookup: you’ve heard Joe’s is good.



Lookup Joe’s


Lookup Chez Joe


Lookup Restaurant Joe’s


Lookup Bistro Joe’s


Lookup Restaurant Chez Joe’s


Lookup Le Restaurant Chez Joe


Lookup Le Bar Restaurant Joe


Lookup Le Restaurant Casa Chez Joe

CS
-
424 Gregory Dudek

Decision trees: issues


Constructing a decision tree is easy… really easy!


Just add examples in turn.





Difficulty: how can we extract a
simplified

decision
tree?


This implies (among other things) establishing a
preference order (bias) among alternative decision trees.


Finding the
smallest

one proves to be VERY hard.
Improving over the trivial one is okay.

CS
-
424 Gregory Dudek

Office size example

Training examples:


1. large ^ cs ^ faculty
-
> yes


2. large ^ ee ^ faculty
-
> no


3. large ^ cs ^ student
-
> yes


4. small ^ cs ^ faculty
-
> no


5. small ^ cs ^ student
-
> no


The questions about office size, department and status
tells use something about the mystery attribute.

Let’s encode all this as a decision tree.

CS
-
424 Gregory Dudek

Decision tree #1



size


/
\


large small


/
\


dept no {4,5}


/
\



cs ee


/
\


yes

{1,3}

no

{2}

CS
-
424 Gregory Dudek

Decision tree #2



status


/
\


faculty student


/
\


dept dept


/
\

/
\


cs ee ee cs


/
\

/
\


size
no

? size


/
\

/
\



large small large small


/
\

/
\


yes no {4} yes no {5}

CS
-
424 Gregory Dudek

Making a tree

How can we build a decision tree (that might be good)?


Objective: an algorithm that builds a decision tree from the root
down.

Each node in the decision tree is associated with a set


of training examples that are split among its children.



Input: a node in a decision tree with no children


and associated with a set of training examples


Output: a decision tree that classifies all of the examples


i.e., all of the training examples stored in each leaf


of the decision tree are in the same class


CS
-
424 Gregory Dudek

Procedure: Buildtree



If all of the training examples are in the same class,


then quit,

else 1. Choose an attribute to split the examples.


2. Create a new child node for each attribute value.


3. Redistribute the examples among the children


according to the attribute values.


4. Apply buildtree to each child node.



Is this a
good

decision tree? Maybe? How do we decide?

CS
-
424 Gregory Dudek

A Bad tree


To identify an animal (goat,dog,housecat,tiger)



Is it a dog?


Is it a housecat?


Is it a tiger?


Is it a goat?

CS
-
424 Gregory Dudek


A good tree?



It is a cat (cat family)? (if yes, what kind.)



Is it a dog?




Max depth 2 questions.

CS
-
424 Gregory Dudek

Best Property




Need to select property / feature / attribute


Goal find short tree (Occam's razor)


Base this on MAXIMUM depth



select
most informative

feature


One that best splits (classifies) the examples



Use measure from
information theory


Claude Shannon (1949)

CS
-
424 Gregory Dudek

Entropy


Entropy is often described as a measure of disorder.


Maybe better to think of a measure of
unpredictability
.



Low entropy = highly ordered


High entropy = unpredictable = surprising
-
> chaotic



As defined, entropy is related to the number of bits
needed. Over some set of states or outcomes with
probability p:


-


p log p

CS
-
424 Gregory Dudek

Entropy



measures the (im) purity in collection S of examples



Entropy(S) =

-

p_{+} log_2 (p_{+})
-

p_{
-
} log_2 (p_{
-
})



p_{+} is proportion of positive examples.


p_{
-
} is proportion of negative examples.

CS
-
424 Gregory Dudek