Machine Learning

bindsodavilleAI and Robotics

Oct 14, 2013 (3 years and 7 months ago)

115 views

1
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Machine Learning
•What is
learning
?
2
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Machine Learning
•What is
learning
?
•“That is what learning is. You suddenly understand
something you've understood all your life,
but in a
new way
.”
(Doris Lessing–2007 Nobel Prize in Literature)
3
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Machine Learning
•How to construct programs that
automatically
improve
with
experience
.
4
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Machine Learning
•How to construct programs that automatically
improve with experience.

Learning problem:

Task
T

Performance measure
P

Training experience
E
5
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Machine Learning

Chess game:

Task
T
: playing chess games

Performance measure
P
: percent of games won against
opponents

Training experience
E
: playing practice games againtsitself
6
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Machine Learning

Handwriting recognition:

Task
T
: recognizing and classifying handwritten words

Performance measure
P
: percent of words correctly
classified

Training experience
E
: handwritten words with given
classifications
7
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System

Choosing the training experience:

Direct or indirect feedback

Degree of learner's control

Representative distribution of examples
8
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System

Choosing the target function:

Type of knowledge to be learned

Function approximation
9
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System

Choosing a representation for the target function:

Expressive representation for a close function approximation

Simple representation for simple training data and learning
algorithms
10
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System

Choosing a function approximation algorithm
(learning algorithm)
11
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System

Chess game:

Task
T
: playing chess games

Performance measure
P
: percent of games won against
opponents

Training experience
E
: playing practice games againtsitself

Target function
:
V
: Board →R
12
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System

Chess game:

Target function representation
:
V^
(b) = w
0
+ w
1
x1
+ w
2
x2
+ w
3
x3
+ w
4
x4
+ w
5
x5
+ w
6
x6
x1
: the number of black pieces on the board
x2
: the number of red pieces on the board
x3
: the number of black kings on the board
x4
: the number of red kings on the board
x5
: the number of black pieces threatened by red
x6
: the number of red pieces threatened by black
13
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System

Chess game:

Function approximation algorithm
:
(<
x1
= 3,
x2
= 0,
x3
= 1,
x4
= 0,
x5
= 0,
x6
= 0>, 100)
x1
: the number of black pieces on the board
x2
: the number of red pieces on the board
x3
: the number of black kings on the board
x4
: the number of red kings on the board
x5
: the number of black pieces threatened by red
x6
: the number of red pieces threatened by black
14
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System
•What is
learning
?
15
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System

Learning is an (endless)
generalization
or
induction
process.
16
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Designing a Learning System
Experiment
Generator
Performance
System
Generalizer
Critic
New problem
(initial board)
Solution trace
(game history)
Hypothesis
(V
^)
Training examples
{(b
1, V
1), (b
2, V
2), ...}
17
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Issues in Machine Learning
•What learning algorithms to be used?
•How much training data is sufficient?
•When and how prior knowledge can guide the learning process?
•What is the best strategy for choosing a next training experience?
•What is the best way to reduce the learning task to one or more
function approximation problems?
•How can the learner automatically alter its representation to
improve its learning ability?
18
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Example
Yes
ChangeCoolStrongHighWarmSunny4
No
ChangeWarmStrongHighColdRainy3
Yes
SameWarmStrongHighWarmSunny2
Yes
SameWarmStrongNormalWarmSunny1
EnjoySport
ForecastWaterWindHumidityAirTempSky
Example
Experience
Prediction
?
???
ChangeWarmStrongHighColdRainy5
?
???
SameCoolStrongLowWarmSunny7
?
???
SameWarmStrongNormalWarmSunny6
LowWeak
19
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Example

Learning problem:

Task
T
: classifying days on which my friend enjoys water sport

Performance measure
P
: percent of days correctly classified

Training experience
E
: days with given attributes and classifications
20
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Concept Learning
•Inferring a boolean-valued function from training
examples of its input (
instances
) and output
(
classifications
).
21
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Concept Learning

Learning problem:

Target concept
: a subset of the set of instances X
c
: X →
→→→{0, 1}

Target function
:
Sky ×
×××AirTemp×
×××Humidity ×
×××Wind ×
×××Water ×
×××Forecast→
→→→{Yes, No}

Hypothesis
:
Characteristics of all instances of the concept to be learned

≡≡≡Constraints on instance attributes
h
: X →
→→→{0, 1}
22
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Concept Learning

Satisfaction:
h
(x) = 1
iff
x satisfies all the constraints of
h
h
(x) = 0 otherwsie

Consistency:
h
(x) =
c
(x) for every instance x of the
training examples

Correctness:
h
(x) =
c
(x) for every instance x of X
23
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Concept Learning

How to represent a hypothesis function?
24
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Concept Learning

Hypothesis representation
(constraints on instance attributes)
:
<
Sky
,
AirTemp
,
Humidity
,
Wind
,
Water
,
Forecast
>

?
: any value is acceptable

single required
value


∅∅∅
: no value is acceptable
25
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Concept Learning

General-to-specific ordering of hypotheses: hj
≥g
hk
iff

∀∀∀x∈
∈∈∈X: h
k(x) = 1 ⇒
⇒⇒⇒hj(x) = 1
Specific
General
h1
= <Sunny, ?, ?, Strong, ? , ?>
h2
= <Sunny, ?, ?, ? , ? , ?>
h3
= <Sunny, ?, ?, ? , Cool, ?>
H
Lattice
(Partial order)
h1
h3
h2
26
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
Yes
ChangeCoolStrongHighWarmSunny4
No
ChangeWarmStrongHighColdRainy3
Yes
SameWarmStrongHighWarmSunny2
Yes
SameWarmStrongNormalWarmSunny1
EnjoySport
ForecastWaterWindHumidityAirTempSky
Example
h
= < ∅, ∅, ∅, ∅, ∅, ∅>
h
= <Sunny, Warm, Normal, Strong, Warm, Same>
h
= <Sunny, Warm, ? , Strong, Warm, Same>
h
= <Sunny, Warm, ? , Strong, ? , ? >
27
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
•Initialize
h
to the most specific hypothesis in
H
:
•For each positive training instance
x
:
For each attribute constraint
ai
in
h
:
If
the constraint is not satisfied by
x
Then
replace
ai
by the next more general
constraint satisfied by
x
•Output hypothesis
h
28
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
Yes
ChangeCoolStrongHighWarmSunny4
No
ChangeWarmStrongHighColdRainy3
Yes
SameWarmStrongHighWarmSunny2
Yes
SameWarmStrongNormalWarmSunny1
EnjoySport
ForecastWaterWindHumidityAirTempSky
Example
h
= <Sunny, Warm, ? , Strong, ? , ? >
Prediction
NoNoNoNo
ChangeWarmStrongHighColdRainy5
YesYesYesYes
SameCoolStrongLowWarmSunny7
YesYesYesYes
SameWarmStrongNormalWarmSunny6
29
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
•The output hypothesis is the
most specific
one that
satisfies all positive training examples.
30
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
•The result is consistent with the
positive
training examples.
31
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
•Is the result is consistent with the
negative
training
examples?
32
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
Yes
ChangeCoolStrongHighWarmSunny4
NoNoNoNo
Change
ChangeChange
ChangeCool
CoolCool
CoolStrongStrongStrongStrongNormal
NormalNormal
NormalWarmWarmWarmWarmSunnySunnySunnySunny5
555
No
ChangeWarmStrongHighColdRainy3
Yes
SameWarmStrongHighWarmSunny2
Yes
SameWarmStrongNormalWarmSunny1
EnjoySport
ForecastWaterWindHumidityAirTempSky
Example
h
= <Sunny, Warm, ? , Strong, ? , ? >
33
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
•The result is consistent with the
negative
training examples
if the
target concept
is contained in
H
(and the training
examples are correct).
34
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
•The result is consistent with the
negative
training examples
if the
target concept
is contained in
H
(and the training
examples are correct).
•Sizes of the space:
–Size of the instance space:
|X|
= 3.2.2.2.2.2 = 96
–Size of the concept space
C
= 2
|X|
= 2
96
–Size of the hypothesis space
H
= (4.3.3.3.3.3) + 1 = 973 << 2
96

⇒⇒⇒The target concept (in
C
) may not be contained in
H
.
35
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
FIND-S
•Questions:
–Has the learner
converged
to the target concept, as there can
be several consistent hypotheses (with both positive and
negative training examples)?
–Why the
most specific
hypothesis is preferred?
–What if there are
several maximally specific
consistent
hypotheses?
–What if the training examples are
notcorrect
?
36
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
List-then-Eliminate Algorithm

Version space
: a set of all hypotheses that are
consistent with the training examples.
•Algorithm:
–Initial version space = set containing every hypothesis in
H
–For each
training example
<x, c(x)>
, remove from the version
space any hypothesis
h
for which
h(x) ≠
≠≠≠c(x)
–Output the hypotheses in the version space
37
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
List-then-Eliminate Algorithm
•Requires an exhaustive enumeration of all hypotheses
in
H
38
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Compact Representation of
Version Space

G
(
the generic boundary
): set of the
most generic
hypotheses of
H
consistent with the training data
D
:
G
= {
g

∈∈∈
H
| consistent(
g
,
D
) ∧
∧∧∧¬∃
¬∃¬∃¬∃
g’

∈∈∈
H
:
g’
>
>>>
g
g

∧∧∧consistent(
g’
,
D
)}

S
(
the specific boundary
): set of the
most specific
hypotheses of
H
consistent with the training data
D
:
S
= {
s

H
| consistent(
s
,
D
) ∧
∧∧∧¬∃
¬∃¬∃¬∃
s’

∈∈∈
H
:
s
>
>>>
g
s’

∧∧∧consistent(
s’
,
D
)}
39
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Compact Representation of
Version Space
•Version space = <
G
,
S
> = {
h

∈∈∈H| ∃
∃∃∃g∈
∈∈∈
G

∃∃∃s∈
∈∈∈
S
: g ≥
≥≥≥g
h

≥≥≥g
s}
S G
40
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Candidate-Elimination Algorithm
S0
= {<∅, ∅, ∅, ∅, ∅, ∅>}
G0
= {<?, ?, ?, ?, ?, ?>}
S1
= {<Sunny, Warm, Normal, Strong, Warm, Same>}
G1
= {<?, ?, ?, ?, ?, ?>}
S2
= {<Sunny, Warm, ?, Strong, Warm, Same>}
G2
= {<?, ?, ?, ?, ?, ?>}
S3
= {<Sunny, Warm, ?, Strong, Warm, Same>}
G3
= {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}
S4
= {<Sunny, Warm, ?, Strong, ?, ?>}
G4
= {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}
Yes
Change
Cool
Strong
High
Warm
Sunny
4
No
Change
Warm
Strong
High
Cold
Rainy
3
Yes
Same
Warm
Strong
High
Warm
Sunny
2
Yes
Same
Warm
Strong
Normal
Warm
Sunny
1
EnjoySport
Forecast
Water
Wind
Humidity
AirTemp
Sky
Example
S G
41
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Candidate-Elimination Algorithm
S4
= {<Sunny, Warm, ?, Strong, ?, ?>}
<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>
G4
= {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}
42
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Candidate-Elimination Algorithm
•Initialize
G
to the set of
maximally general
hypotheses
in
H
•Initialize
S
to the set of
maximally specific
hypotheses
in
H
43
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Candidate-Elimination Algorithm
•For each
positive
example
d
:
–Remove from
G
any hypothesis inconsistent with
d
–For each
s
in
S
that is inconsistent with
d
:
Remove
s
from
S
Add
to
S
all least generalizations
h
of
s
, such that
h
is consistent with
d
and some hypothesis in
G
is more general than
h
Remove
from
S
any hypothesis that is more general than another
hypothesis in
S
44
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Candidate-Elimination Algorithm
•For each
negative
example
d
:
–Remove from
S
any hypothesis inconsistent with
d
–For each
g
in
G
that is inconsistent with
d
:
Remove
g
from
G
Add
to
G
all least specializations
h
of
g
, such that
h
is consistent with
d
and some hypothesis in
S
is more specific than
h
Remove
from
G
any hypothesis that is more specific than another
hypothesis in
G
45
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Candidate-Elimination Algorithm
•The version space will
converge
toward the correct
target concepts if:

H
contains the correct target concept
–There are no errors in the training examples
•A training instance to be
requested next
should
discriminate among the alternative hypotheses in the
current version space:
46
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Candidate-Elimination Algorithm

Partially learned
concept can be used to classify new
instances using the majority rule.
S4
= {<Sunny, Warm, ?, Strong, ?, ?>}
<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>
G4
= {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}
?
???
SameCoolStrongHighWarmRainy5

⊕⊕⊕
￿
￿￿
￿

⊕⊕⊕
￿
￿￿
￿
￿
￿￿
￿
￿
￿￿
￿
47
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Inductive Bias
•Size of the instance space:
|X|
= 3.2.2.2.2.2 = 96
•Number of possible concepts = 2
|X|
= 2
96
•Size of
H
= (4.3.3.3.3.3) + 1 = 973 << 2
96
48
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Inductive Bias
•Size of the instance space:
|X|
= 3.2.2.2.2.2 = 96
•Number of possible concepts = 2
|X|
= 2
96
•Size of
H
= (4.3.3.3.3.3) + 1 = 973 << 2
96
⇒a
biased
hypothesis space
49
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Inductive Bias
•An
unbiased
hypothesis space
H’
that can represent
every subset of the instance space
X
:
Propositional
logic sentences
•Positive examples:
x1
,
x2
,
x3
Negative examples:
x4
,
x5
h(x)

≡≡≡
(x = x
1)

∨∨∨
(x = x
2)

∨∨∨
(x = x
3)

≡≡≡
x1

∨∨∨
x2

∨∨∨
x3
50
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Inductive Bias
x1

x2

x3

x6
x1

x2

x3
Any new instance
x
is classified positive
by half
of the version space,
and negative by the other half ⇒
⇒⇒⇒
not classifiable
x1

x2

x3

x
6
∨…
(not including
x4
and
x5
)
51
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Inductive Bias
?
CheapFamousWed9
No
CheapFamousSat8
Yes
CheapInfamousThu6
No
ExpensiveFamousSun5
Yes
ModerateInfamousWed4
No
CheapInfamousSun3
No
ModerateFamousSat2
Yes
ExpensiveFamousTue7
ExpensiveExpensive
Price
?
InfamousSat10
Yes
FamousMon1
EasyTicket
ActorDayExample
52
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Inductive Bias
No
HighBad2
Yes
LowGood1
Buy
PriceQualityExample
?
LowBad4
?
HighGood3
53
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Inductive Bias
•A learner that
makes no prior assumptions
regarding
the identity of the target concept
cannot classify any
unseen instances
.
54
05 January 2010
Cao Hoang Tru
CSE Faculty -HCMUT
Homework
Exercises
2-1 →2.5 (Chapter 2, ML textbook)