# Lecture Slides for

Τεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 7 μήνες)

159 εμφανίσεις

ETHEM ALPAYDIN

alpaydin@boun.edu.tr

http://www.cmpe.boun.edu.tr/~ethem/i2ml2e

Lecture Slides for

1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

2

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Decision Trees

A decision tree

An efficient nonparametric method

A hierarchical model

Divided
-
and

conquer strategy

Supervised learning

3

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Tree Uses Nodes, and Leaves

4

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Divide and Conquer

Internal decision nodes

Univariate:
Uses a single attribute,
x
i

Numeric
x
i

:

Binary split :
x
i

>
w
m

Discrete
x
i

:

n
-
way split for
n

possible values

Multivariate:
Uses all attributes,
x

Leaves

Classification: Class labels, or proportions

Regression: Numeric;
r

average, or local fit

Learning is

greedy
; find the best split recursively (Breiman et al, 1984;
Quinlan, 1986, 1993)

5

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Classification Trees

For node
m
,

N
m

instances reach
m
,

N
i
m

belong to
C
i

Node
m

is
pure

if
p
i
m

is
0

or
1

6

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Entropy

Measure of
impurity

is
entropy

Entropy in information theory specifies the
minimum number of
bits

needed to encode the classification accuracy of an instance.

7

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Example of Entropy

In a two
-
class problem

If
p
1

=
1

and
p
2

=
0

all examples are of
C
1

we do not need to send anything

the entropy is
0

If
p
1
=
p
2
= 0.5

we need to send a bit to signal one of the two cases

the entropy is
1

In between these two extremes, we can devise codes and use less than a bit
per message by having shorter codes for the more likely class and longer
codes for the less likely.

Entropy function for a
two
-
class problem

8

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

The Properties of Measure Functions

The properties of functions measuring the
impurity

of a split:

9

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Examples

Examples of 2
-
class measure functions are

Entropy :

Gini index :

Misclassification error :

10

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Best Split

If node
m

is pure, generate a leaf and stop,

otherwise split and continue recursively

Impurity after split:

N
mj

of
N
m

take branch
j
,

N
i
mj

belong to
C
i

Find the variable and split that
min impurity

(among all variables
--

and split positions for numeric variables)

11

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

12

Regression Trees

Error at node
m
:

After splitting:

(the error should decrease)

13

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Model Selection in Trees:

14

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Pruning Trees

Remove subtrees for better generalization (decrease variance)

Prepruning
:

Early stopping

Postpruning
:

Grow the whole tree then prune subtrees which overfit on the
pruning set

Prepruning is faster, postpruning is more accurate
(requires a separate
pruning set)

15

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Rule Extraction from Trees

C4.5Rules

(Quinlan, 1993)

16

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Learning Rules

Rule induction is similar to tree induction
but

-
first,

rule induction is depth
-
first;
one rule at a time

Rule set contains rules; rules are conjunctions of terms

A r
ule

covers
an example if all terms of the rule evaluate to true for the
example
.

A rule is said to
cover

an example if the example satisfies all the conditions
of the rule.

Sequential covering:

Generate rules one at a time until all positive examples are covered

17

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Ripper Algorithm

There are two kinds of loop in Ripper algorithm
(Cohen, 1995)
:

Outer loop
:
adding one rule at a time to the rule base

Inner loop
:
adding one condition at a time to the current rule

Conditions are added to the rule to maximize an information
gain measure.

Conditions are added to the rule until it covers
no negative
example.

The pseudo
-
code of the
outer loop
of Ripper is given in Figure 9.7.

18

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

DL: description length of

the rule base

O(Nlog
2
N)

The description length of a rule base

= (the sum of the description lengths

of all the rules in the rule base)

+ (the description of the instances

not covered by the rule base)

19

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

20

Ripper Algorithm

In Ripper, conditions are added to the rule to

Maximize

an information gain measure

R

: the original rule

R
’ : the candidate rule after adding a condition

N

(
N
’): the number of instances that are covered by
R

(
R
’)

N
+

(
N

+
): the number of true positives in
R

(
R
’)

s
: the number of true positives in
R

and
R
condition)

Until it covers no negative example

p

and
n

: the number of true and false

positives respectively.

Rule value metric

21

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multivariate Trees

> 0

22

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)