Lecture Slides for

elbowcheepΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

141 εμφανίσεις

ETHEM ALPAYDIN

© The MIT Press, 2010


alpaydin@boun.edu.tr

http://www.cmpe.boun.edu.tr/~ethem/i2ml2e

Lecture Slides for

1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

2

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Decision Trees


A decision tree


An efficient nonparametric method


A hierarchical model


Divided
-
and

conquer strategy


Supervised learning


3

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Tree Uses Nodes, and Leaves

4

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Divide and Conquer


Internal decision nodes


Univariate:
Uses a single attribute,
x
i


Numeric
x
i

:


Binary split :
x
i

>
w
m


Discrete
x
i

:


n
-
way split for
n

possible values


Multivariate:
Uses all attributes,
x


Leaves


Classification: Class labels, or proportions


Regression: Numeric;
r

average, or local fit


Learning is

greedy
; find the best split recursively (Breiman et al, 1984;
Quinlan, 1986, 1993)

5

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Classification Trees


For node
m
,


N
m

instances reach
m
,

N
i
m

belong to
C
i






Node
m

is
pure

if
p
i
m

is
0

or
1


6

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Entropy


Measure of
impurity

is
entropy







Entropy in information theory specifies the
minimum number of
bits

needed to encode the classification accuracy of an instance.



7

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Example of Entropy


In a two
-
class problem




If
p
1

=
1

and
p
2

=
0



all examples are of
C
1



we do not need to send anything




the entropy is
0


If
p
1
=
p
2
= 0.5


we need to send a bit to signal one of the two cases




the entropy is
1


In between these two extremes, we can devise codes and use less than a bit
per message by having shorter codes for the more likely class and longer
codes for the less likely.

Entropy function for a
two
-
class problem

8

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

The Properties of Measure Functions


The properties of functions measuring the
impurity

of a split:


















9

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Examples


Examples of 2
-
class measure functions are


Entropy :




Gini index :




Misclassification error :




10

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Best Split


If node
m

is pure, generate a leaf and stop,


otherwise split and continue recursively


Impurity after split:



N
mj

of
N
m

take branch
j
,

N
i
mj

belong to
C
i







Find the variable and split that
min impurity

(among all variables
--

and split positions for numeric variables)

11

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

12

Regression Trees


Error at node
m
:






After splitting:

(the error should decrease)

13

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Model Selection in Trees:

14

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Pruning Trees


Remove subtrees for better generalization (decrease variance)


Prepruning
:


Early stopping


Postpruning
:


Grow the whole tree then prune subtrees which overfit on the
pruning set


Prepruning is faster, postpruning is more accurate
(requires a separate
pruning set)

15

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Rule Extraction from Trees

C4.5Rules

(Quinlan, 1993)

16

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Learning Rules


Rule induction is similar to tree induction
but


tree induction is breadth
-
first,


rule induction is depth
-
first;
one rule at a time



Rule set contains rules; rules are conjunctions of terms


A r
ule

covers
an example if all terms of the rule evaluate to true for the
example
.


A rule is said to
cover

an example if the example satisfies all the conditions
of the rule.



Sequential covering:



Generate rules one at a time until all positive examples are covered

17

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Ripper Algorithm


There are two kinds of loop in Ripper algorithm
(Cohen, 1995)
:


Outer loop
:
adding one rule at a time to the rule base


Inner loop
:
adding one condition at a time to the current rule


Conditions are added to the rule to maximize an information
gain measure.


Conditions are added to the rule until it covers
no negative
example.


The pseudo
-
code of the
outer loop
of Ripper is given in Figure 9.7.


18

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

DL: description length of


the rule base

O(Nlog
2
N)

The description length of a rule base

= (the sum of the description lengths


of all the rules in the rule base)

+ (the description of the instances


not covered by the rule base)

19

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

20

Ripper Algorithm


In Ripper, conditions are added to the rule to


Maximize

an information gain measure




R

: the original rule


R
’ : the candidate rule after adding a condition


N

(
N
’): the number of instances that are covered by
R

(
R
’)


N
+

(
N

+
): the number of true positives in
R

(
R
’)


s
: the number of true positives in
R

and
R
’ (after adding the
condition)


Until it covers no negative example



p

and
n

: the number of true and false


positives respectively.

Rule value metric

21

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multivariate Trees

> 0

22

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)