Machine Learning 2
1
Machine Learning
Basic definitions:
•
concept
: often described implicitely(„
good politician
“) using
examples, i.e.
training data
•
hypothesis:
an attempt to describe the concept in an
explicite way
–
concept / hypothesis are presented in the
corresponding language
–
hypothesis is verified using
testing data
•
background knowledge
provides info about the context
(properties of environment)
•
learning algorithm searches the space of hypothesis to find
consistent and complete h
., the space is restricted by
introducing
bias
Machine Learning 2
2
Goal of inductive ML
Suggest a hypothesis
characterizing
concept
in a
given domain (= the set of
objects in this domain)
implicitely described
through
a limited
set of classified
examples
E
+
and
E

.
The hypothesis
:
•
has to cover
E
+
while avoiding
E

•
be applicable to objects which
do not belong to
E
+
and
E

.
Machine Learning 2
3
Basic notions
•

domain
of the concept
K
, ie.
K
.
•
E
a set of
training examples
is
complemented by a classifcation, i.e. a
function
cl
:
E

>
yes
, no
.
•
E
+
denotes all elements of
E
classified as
yes
•
E
+
and
E

are a disjoint cover of the set
E
Machine Learning 2
4
Example 1 „computer game“:
Is there a way
how to distinguish quickly a friendly robot
from the others?
Friendly r.
Unfriendly r.
Machine Learning 2
5
Concept Language and
Background Knowledge
•
Examples of concept language:
A set of real or idealised examples expressed in the object language that
represent each of the concepts learned (Nearest Neighbour)
attribute

value pairs (propositional logic)
relational concepts (first order logic)
•
One can extend the concept language with user

defined
concepts or
background knowledge
.
BK plays an important role in
Inductive Logic Programming (ILP)
The use of certain BK predicates may be a necessary condition for
learning the right hypothesis.
Redundant or irrelevant BK slows down the learning.
Machine Learning 2
6
Example 1: hypothesis and its testing
Head
shape
Smiling
face
Neck
Body
shape
Holding
Friendly
circle
nothing
tie
circle
sword
yes
triangle
yes
nothing
square
nothing
yes
H1 in the form of a decision tree
if neck( r)
= bow
then „friendly”
= nothing
then
if head_shape ( r) = triangle
then „friendly“
else „unfriendly“
= tie
then
if body_shape( r) = square then „unfriendly“ else
if head_shape( r) = circle then „friendly“
else „unfriendly“
Machine Learning 2
7
Example 1: hypothesis and its testing
Machine Learning 2
8
Hypothesis

attempt for a formal description
Both
examples
and
hypothesis
have
to
be
specified
in
a
language
.
Hypothesis
has
the
form
of
a
formula
(X)
with
a
single
free
variable
X
.
Let
us
define
extension
Ext
of
a
hypotheis
(X)
wrt
.
the
domain
as
the
set
of
all
elements
of
,
睨楣w
m敥e
瑨t
捯湤楴楯i
,
瑪
.
Ext
=
o
⡯(
platí
Properties of hypothesis
•
hypothesis
is
complete
(úplná), iff
E
+
Ext
•
h.
is
consistent
, if it covers no negative examples, i.e.
Ext
E

=
•
h.
is correct
, if it is
complete a
nd consistent
Machine Learning 2
9
How many correct hypothesis can be
designed for a fixed training set
E
?
•
Fact:
the number of possible concepts is much more than
possible hypothesis (a formula)
•
concequence
: most of the concepts cannot be
characterized by a corresponding hypothesis

we have to
accept the hypothesis, which are “approximately correct“
only.
•
Uniqueness of an
“approximately correct“
hypothesis
cannot be ensured.
Machine Learning 2
10
Choice of a hypthesis and
Ockham
´
s rasor
Williamu of Ockham
recommends the way how
to compare the hypothesis:
„
Entia non sunt
multiplicanda praeter
necessitatem
“,
•
„
Einstein
: „…
the
language should not be
sompler than necessary
.“
Machine Learning 2
11
Machine Learning Biases
•
The concept/hypothesis language specifies the
language bias
, which limits the set of all
concepts/hypotheses that can be
expressed/considered/learned.
•
The
preference bias
allows us to decide between
two hypotheses (even if they both classify the
training data equally).
•
The
search bias
defines the order in which
hypotheses will be considered.
–
Important if one does not search the whole hypothesis
space.
Machine Learning 2
12
Preference Bias, Search Bias & Version Space
Hypothesis are partially ordered
Version space
:
searches for
the subset of hypotheses that have zero
training error.
+
+
+
+
_
_
_
_
most spec. concept
most gen. concept
Machine Learning 2
13
Types of learning
skill
refinement
(swimming,
biking,
...
)
knowledge
acquisition
Rote
Learning
(chess,
checkers),
the
aim
is
to
find
an
appropriate
heuristic
function
evaluating
the
current
state
of
the
game,
e
.
g
.
MIN

MAX
approach
Case

Based
Reasoning
:
past
experience
is
stored
in
a
database
.
To
solve
a
new
problem,
the
systém
searches
the
DB
to
find
„the
closest
(the
most
similar)
case“

its
solution
is
modified
for
the
current
problem
Advice
Taking
,
learning
to
use
"interpret"
or
"operacionalize"
an
abstract
advice
–
search
for
„applicability
conditions“
•
Induction.
Difference Analysis
: candidate

elimination or version
space approach, decision trees induction etc.
Machine Learning 2
14
Decision tree induction
Given:
Training examples uniformly described by a single set
of the same attributes and classified into a small set of
classes (most often into 2 classes: positive X negative
examples)
Find:
a decision tree allowing to characterize the new species
Simple example:
robots described by 5 discrete atributes and classified
into 2 classes (friendly, unfriendly)
•
Is_smiling
{no, yes},
Holding
{sword,
balloon,
flag},
Has_tie
{no,
yes},
Head_shape
{round,
square,
octagone},
Body_shape
{round,
square,
octagone}
.
Machine Learning 2
15
Machine Learning 2
16
TDIDT
: Top

Down Ind
.
of Decision Trees
given
:
S
... the set of classified examples
goal
: design a decision tree
DT
ensuring the same classification
as
S
1.
The root is denoted by
S
2.
Find the "best" attribute
at
to be used for splitting the
current set
S
3
.
Split the set
S
into the subsets
S
1
, S
2
, ..., S
n
wrt. value of
at
(all examples in the subset
S
i
have the same value
at = v
i
).
This set denotes a node of the
DT
4
.
For each
S
i
do:
If
all examples in
S
i
belong to the same class
or
then
create a leaf with the same label,
else
go to 1 with
S = S
i
Machine Learning 2
17
TDIDT
:
How to choose the "best" attribute?
minimize the
entropy
(Shanon)
H(S
i
) =

p
i
+
log p
i
+

p
i

log p
i

p
i
+
=
the probability that a
random example
in
S
i
is
,
estimated by frequency
Let
the attribute
at
split
S
into the subsets
S
1
, S
2
, ..., S
n
. T
he
entropy of this system is
defined
H(S,at) =
i
n
= 1
P(S
i
) H (S
i
)
where
P(S
i
)
is probability of the event S
i
, approx. by relative
size
S
i
 / S

Choose
at
with the minimal
H(S,at)
Machine Learning 2
18
Learning to fly simulator F16 [Samuel, 95]
Design an automatic controller for F16 for following complex task:
1.
Start up and rise upto the heigth 2000 feet
2.
Fly 32000 feet north
3.
Turn right 330
°
4.
When 42000 feet from the starting point (direction N

S) turn left and head
towards the starting point, the rotation is finished when the course is between
140
°
and 180
°
.
5.
Adjust the flight direction so that it is paralel to the landing course, tolerance 5
for flight direction and 10
°
for wing twist wrt. horizont
6.
Decrease the heigth and move towards the start of the landing path
7.
Lend
Training data
:
3 skilled pilots performed the assigned mission, each 30 times
Each flight is described by 1000 vectors characterizing ( total of 90000 training
examples):
Position and state of the plane
Pilot
’
s control action
Machine Learning 2
19
Learning to fly simulator F16 [Samuel, 95]
Position and state
•
on_gound
boolean: is the plane on the ground?
•
g_limit
boolean: acceleration limit exceeded?
•
wing_stall (is the plane stabile?), twist (int: 0
°

360
°
, wings wrt. horizont)
•
elevation (angle „body wrt. horizont“), azimuth, roll_speed
(wings deflection),
elevation_speed, azimuth_speed
, airspeed, climbspeed, E/W distance, N/S
distance, fuel (weight of current supply)
Control:
•
rollers and elevator: position of horizontal/ vertical deflection
•
thrust
integer: 0

100%, force
•
flaps
integer: 0
°
, 10
°
or 20
°
, wing twist
Each of the 7 phases calls for a specific type of control.
The
training
data
are
divided
into
7
disjunctive
sets
which
are
used
to
design
specific
decision
trees
(independently
for
each
task
phase
and
each
control
action)
.
Control
ensured
by
7
*
4
decison
trees
.
Machine Learning 2
20
Tasks adressed by ML applications
•
Classification/prediction
–
diagnosis (troubleshooting motor pumps, medicine,.., SKICAT

astronomical cataloguing)
–
execution/control (GASOIL

separation of hydrocarbons)
–
configuration/design (Siemens: equipment c., Boeing)
–
language understanding
–
vision and speech
–
planning and schedulling
•
Why? Important speed up of the development and maintenace
–
180 man

years to develop ES XCON with 8000 rules, 30 m

y needed for
maint.
–
1 man

year to develop BP GASOIL (MLbased) with 2800 rules, 0,1 m

y
needed for maint.
Comments 0
Log in to post a comment