MACHINE LEARNING Overview Learning agents - Computer and ...

milkygoodyearAI and Robotics

Oct 14, 2013 (4 years and 26 days ago)

90 views

MACHINELEARNING
Overview

Mostofthetimewecan’tprogramouragentstodoeverything
righttobeginwith.

Wedon’thaveenoughinformationabouttheenvironment.

Sowegetthemto
learn
whattodo.

Differentformsoflearning:
–Inductivelearning;and
–Reinforcementlearning.
cisc3410-fall2010-parsons-lect12a2
Learningagents
Performance standard
Agent
Environment
Sensors
Effectors
Performance element
changes
knowledge
learning goals
Problem generator
feedback
Learning element
Critic
experiments
cisc3410-fall2010-parsons-lect12a3

Designoflearningelementisdictatedby
–whattypeofperformanceelementisused
–whichfunctionalcomponentistobelearned
–howthatfunctionalcomponentisrepresented
–whatkindoffeedbackisavailable

Supervisedlearning
:correctanswersforeachinstance.

Reinforcementlearning
:occasionalrewards.
cisc3410-fall2010-parsons-lect12a4

Examplescenarios:
Performance element Alpha-beta search Logical agent Simple reflex agent
Component Eval. fn.Transition model Transition model
Representation Weighted linear function Successor-state axioms Neural net
Dynamic Bayes net
Utility-based agent
Percept-action fn
Feedback Outcome Outcome
Win/loss Correct action

Todaywe’lllookat
inductivelearning
ofdecisiontreesand
reinforcementlearning
.
cisc3410-fall2010-parsons-lect12a5
Inductivelearning

Simplestform:learnafunctionfromexamples(
tabularasa
)

fisthe
targetfunction

An
example
isapairx,f(x):
O
O
X
X
X
,+1

Problem:finda(n)
hypothesis
hsuchthat
h≈f
givena
trainingset
ofexamples
cisc3410-fall2010-parsons-lect12a6
Inductivelearningmethod

Construct/adjusthtoagreewithfontrainingset
(his
consistent
ifitagreeswithfonallexamples)
E.g.,curvefitting:
x
f(x)
cisc3410-fall2010-parsons-lect12a7

Construct/adjusthtoagreewithfontrainingset
(his
consistent
ifitagreeswithfonallexamples)
E.g.,curvefitting:
x
f(x)
cisc3410-fall2010-parsons-lect12a8

Construct/adjusthtoagreewithfontrainingset
(his
consistent
ifitagreeswithfonallexamples)
E.g.,curvefitting:
x
f(x)
cisc3410-fall2010-parsons-lect12a9

Construct/adjusthtoagreewithfontrainingset
(his
consistent
ifitagreeswithfonallexamples)
E.g.,curvefitting:
x
f(x)
cisc3410-fall2010-parsons-lect12a10

Construct/adjusthtoagreewithfontrainingset
(his
consistent
ifitagreeswithfonallexamples)
E.g.,curvefitting:
x
f(x)
cisc3410-fall2010-parsons-lect12a11

Ockham’srazor:maximizeacombinationofconsistencyand
simplicity
cisc3410-fall2010-parsons-lect12a12
Attribute-basedrepresentations
Example
Attributes
Target
Alt
Bar
Fri
Hun
Pat
Price
Rain
Res
Type
Est
WillWait
X
1
T
F
F
T
Some
$$$
F
T
French
0–10
T
X
2
T
F
F
T
Full
$
F
F
Thai
30–60
F
X
3
F
T
F
F
Some
$
F
F
Burger
0–10
T
X
4
T
F
T
T
Full
$
F
F
Thai
10–30
T
X
5
T
F
T
F
Full
$$$
F
T
French
>60
F
X
6
F
T
F
T
Some
$$
T
T
Italian
0–10
T
X
7
F
T
F
F
None
$
T
F
Burger
0–10
F
X
8
F
F
F
T
Some
$$
T
T
Thai
0–10
T
X
9
F
T
T
F
Full
$
T
F
Burger
>60
F
X
10
T
T
T
T
Full
$$$
F
T
Italian
10–30
F
X
11
F
F
F
F
None
$
F
F
Thai
0–10
F
X
12
T
T
T
T
Full
$
F
F
Burger
30–60
T
cisc3410-fall2010-parsons-lect12a13

Examplesdescribedby
attributevalues
(Boolean,discrete,
continuous,etc.)

Classification
ofexamplesis
positive
(T)or
negative
(F)
cisc3410-fall2010-parsons-lect12a14
Decisiontrees

Hereisthe“true”treefordecidingwhethertowait:
No
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
Yes
None
Some
Full
>60
30-60
10-30
0-10
No
Yes
Alternate?
Hungry?
Reservation?
Bar?
Raining?
Alternate?
Patrons?
Fri/Sat?
WaitEstimate?
F
T
F
T
T
T
F
T
T
F
T
T
F
cisc3410-fall2010-parsons-lect12a15

Decisiontreescanexpressanyfunctionoftheinputattributes.

ForBooleanfunctions,truthtablerow→pathtoleaf:
F
T
A
B
F
T
B
ABA xor B
FFFF
TT
T
F
T
TT
F
F
FF
T
T T
cisc3410-fall2010-parsons-lect12a16

Trivially,∃aconsistentdecisiontreeforanytrainingsetwithone
pathtoleafforeachexample.
–unlessfnondeterministicinx

Thistrivialtreeprobablywon’tgeneralizetonewexamples

Prefertofindmore
compact
decisiontrees
cisc3410-fall2010-parsons-lect12a17
Hypothesisspaces

HowmanydistinctdecisiontreeswithnBooleanattributes?
cisc3410-fall2010-parsons-lect12a18

HowmanydistinctdecisiontreeswithnBooleanattributes?
=numberofBooleanfunctions
cisc3410-fall2010-parsons-lect12a19

HowmanydistinctdecisiontreeswithnBooleanattributes?
=numberofBooleanfunctions
=numberofdistincttruthtableswith2
n
rows
cisc3410-fall2010-parsons-lect12a20

HowmanydistinctdecisiontreeswithnBooleanattributes?
=numberofBooleanfunctions
=numberofdistincttruthtableswith2
n
rows=2
2
n
cisc3410-fall2010-parsons-lect12a21

HowmanydistinctdecisiontreeswithnBooleanattributes?
=numberofBooleanfunctions
=numberofdistincttruthtableswith2
n
rows=2
2
n
6Booleanattributesmeans18,446,744,073,709,551,616trees
cisc3410-fall2010-parsons-lect12a22

HowmanydistinctdecisiontreeswithnBooleanattributes?
=numberofBooleanfunctions
=numberofdistincttruthtableswith2
n
rows=2
2
n
6Booleanattributesmeans18,446,744,073,709,551,616trees

Howmanypurelyconjunctivehypotheses(Hungry∧¬Rain)?
cisc3410-fall2010-parsons-lect12a23

HowmanydistinctdecisiontreeswithnBooleanattributes?
=numberofBooleanfunctions
=numberofdistincttruthtableswith2
n
rows=2
2
n
6Booleanattributesmeans18,446,744,073,709,551,616trees

Howmanypurelyconjunctivehypotheses(Hungry∧¬Rain)?

Eachattributecanbein(positive),in(negative),orout⇒3
n
distinctconjunctivehypotheses

Moreexpressivehypothesisspace
–increaseschancethattargetfunctioncanbeexpressed
–increasesnumberofhypothesesconsistentwithtrainingset
⇒maygetworsepredictions
cisc3410-fall2010-parsons-lect12a24
Decisiontreelearning

Aim:findasmalltreeconsistentwiththetrainingexamples.

Idea:(recursively)choose“mostsignificant”attributeasrootof
(sub)tree.
cisc3410-fall2010-parsons-lect12a25
Decisiontreelearning
function
DTL
(
examples,attributes,default
)
returns
adecisiontree
if
examples
isempty
thenreturn
default
elseif
all
examples
havethesameclassification
thenreturn
the
classification
elseif
attributes
isempty
thenreturn
MODE(
examples
)
else
best
←CHOOSE-ATTRIBUTE(
attributes
,
examples
)
tree
←anewdecisiontreewithroottest
best
foreach
value
v
i
of
best
do
examples
i
←{elementsof
examples
with
best
=v
i
}
subtree
←DTL(
examples
i
,
attributes

best
,MODE(
examples
))
addabranchto
tree
withlabel
v
i
andsubtree
subtree
return
tree
cisc3410-fall2010-parsons-lect12a26
Choosinganattribute

Idea:agoodattributesplitstheexamplesintosubsetsthatare
(ideally)“allpositive”or“allnegative”.
NoneSomeFull
Patrons?
FrenchItalianThaiBurger
Type?

Patrons?isabetterchoice—gives
information
aboutthe
classification
cisc3410-fall2010-parsons-lect12a27
Information

Informationanswersquestions.

ThemorecluelessIamabouttheanswerinitially,themore
informationiscontainedintheanswer.

Scale:1bit=answertoBooleanquestionwithpriorh0.5,0.5i

InformationinananswerwhenpriorishP
1
,...,P
n
iis
H(hP
1
,...,P
n
i)=
n
￿
i=1
−P
i
log
2
P
i
(alsocalled
entropy
oftheprior)
cisc3410-fall2010-parsons-lect12a28

Supposewehaveppositiveandnnegativeexamplesattheroot:
H(hp/(p+n),n/(p+n)i)
bitsneededtoclassifyanewexample.

For12restaurantexamples,p=n=6soweneed1bit

AnattributesplitstheexamplesEintosubsetsE
i
,eachofwhich
(wehope)needslessinformationtocompletetheclassification
cisc3410-fall2010-parsons-lect12a29

LetE
i
havep
i
positiveandn
i
negativeexamples.
H(hp
i
/(p
i
+n
i
),n
i
/(p
i
+n
i
)i)
bitsneededtoclassifyanewexample

Expected
numberofbitsperexampleoverallbranchesis
￿
i
p
i
+n
i
p+n
H(hp
i
/(p
i
+n
i
),n
i
/(p
i
+n
i
)i)

ForPatrons?,thisis0.459bits.

ForTypethisis(still)1bit

Choosetheattributethatminimizestheremaininginformation
needed
cisc3410-fall2010-parsons-lect12a30
Backtotheexample

Decisiontreelearnedfromthe12examples:
No Yes
Fri/Sat?
NoneSomeFull
Patrons?
No Yes
Hungry?
Type?
FrenchItalianThaiBurger
F
T
T
F
F
T
F
T

Substantiallysimplerthan“true”tree—amorecomplex
hypothesisisn’tjustifiedbysmallamountofdata
cisc3410-fall2010-parsons-lect12a31
Performancemeasurement

Howdoweknowthath≈f?
1.Usetheoremsofcomputational/statisticallearningtheory
2.Tryhonanew
testset
ofexamples
(use
samedistributionoverexamplespace
astrainingset)

Learningcurve
=%correctontestsetasafunctionoftrainingset
size
0.4
0.5
0.6
0.7
0.8
0.9
1
0
20
40
60
80
100
% correct on test set
Training set size
cisc3410-fall2010-parsons-lect12a32

Learningcurvedependson

realizable
(canexpresstargetfunction)vs.
non-realizable
non-realizabilitycanbeduetomissingattributesorrestricted
hypothesisclass
–redundantexpressiveness(e.g.,loadsofirrelevantattributes)
% correct
# of examples
1
nonrealizable
redundant
realizable
cisc3410-fall2010-parsons-lect12a33
Summary

Learningneededforunknownenvironments,lazydesigners

Learningagent=performanceelement+learningelement

Learningmethoddependsontypeofperformanceelement,
availablefeedback,typeofcomponenttobeimproved,andits
representation

Forsupervisedlearning,theaimistofindasimplehypothesis
approximatelyconsistentwithtrainingexamples

Decisiontreelearningusinginformationgain

Learningperformance=predictionaccuracymeasuredontest
set
cisc3410-fall2010-parsons-lect12a34