ppt

ocelotgiantΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

76 εμφανίσεις

INTRODUCTION TO


Machine Learning

ETHEM ALPAYDIN


© The MIT Press, 2004


alpaydin@boun.edu.tr

http://www.cmpe.boun.edu.tr/~ethem/i2ml

Lecture Slides for

CHAPTER 3:

Bayesian Decision
Theory

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

3

Probability and Inference


Result of tossing a coin is

{Heads,Tails}


Random var
X


{1,0}



Bernoulli:
P
{
X
=1} =
p
o
X
(1


p
o
)
(1


X)


Sample:
X
= {
x
t
}
N
t
=1


Estimation:
p
o

= # {Heads}/#{Tosses} = ∑
t

x
t
/
N


Prediction of next toss:



Heads if
p
o

> ½, Tails otherwise

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

4

Classification


Credit scoring: Inputs are income and savings.



Output is low
-
risk vs high
-
risk


Input:
x
= [
x
1
,
x
2
]
T

,Output:
C


{0,1}


Prediction:

















otherwise

0

)
|
0
(
)
|
1
(
if

1

choose
ly
equivalent

or
otherwise

0

5
0
)
|
1
(
if

1

choose
2
1
2
1
2
1
C
C
C
C

,x
x
C
P

,x
x
C
P
.


,x
x
C
P
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

5

Bayes’ Rule









x
x
x
p
p
P
P
C
C
C
|

|

posterior

likelihood

prior

evidence



















1
|
1
|
0
0
0
|
1
1
|
1
1
0














x
x
x
x
x
C
C
C
C
C
C
C
C
P
p
P
p
P
p
p
P
P
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

6

Bayes’ Rule: K>2 Classes





















K
k
k
k
i
i
i
i
i
C
P
C
p
C
P
C
p
p
C
P
C
p
C
P
1
|
|
|
|
x
x
x
x
x








x
x
|

max


|
if


choose
1

and

0
1
k
k
i
i
K
i
i
i
C
P
C
P
C
C
P
C
P





Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

7

Losses and Risks


Actions:
α
i



Loss of
α
i

when the state is
C
k

:
λ
ik



Expected risk (Duda and Hart, 1973)









x
x
x
x
|
min
|
if


choose
|
|
1
k
k
i
i
k
K
k
ik
i
R
R
C
P
R









Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

8

Losses and Risks: 0/1 Loss








k
i
k
i
ik
if

1
if

0
For minimum risk, choose the most probable class









x
x
x
x
|
1
|
|
|
1
i
i
k
k
K
k
k
ik
i
C
P
C
P
C
P
R










Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

9

Losses and Risks: Reject

1
0

otherwise

1
1
if

if

0














,
K
i
k
i
ik










x
x
x
x
x
|
1
|
|
|
|
1
1
i
i
k
k
i
K
k
k
K
C
P
C
P
R
C
P
R




















otherwise

reject
1
|

and


|
|
if


choose






x
x
x
i
k
i
i
C
P
i
k
C
P
C
P
C
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

10

Discriminant Functions



K
,
,
i
,
g
i

1


x




x
x
k
k
i
i
g
g
C
max
if


choose







x
x
x
k
k
i
i
g
g
max
|


R
K

decision regions

R
1
,...,
R
K


















i
i
i
i
i
C
P
C
p
C
P
R
g
|
|
|
x
x
x
x

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

11

K=2 Classes


Dichotomizer (
K
=2) vs Polychotomizer (
K
>2)


g
(
x
) =
g
1
(
x
)


g
2
(
x
)





Log odds:








otherwise

0
if


choose
2
1
C
g
C
x




x
x
|
|
log
2
1
C
P
C
P
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

12

Utility Theory


Prob of state
k

given exidence
x
: P
(
S
k
|
x
)


Utility of
α
i

when state is
k: U
ik


Expected utility:









x
x
x
x
|

max
|
if


Choose
|
|
j
j
i
i
k
k
ik
i
EU
EU
α
S
P
U
EU






Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

13

Value of Information


Expected utility using
x

only








Expected utility using
x

and new feature
z





z

is useful if
EU
(
x
,z
) >
EU
(
x
)







k
k
ik
i
S
P
U
EU
x
x
|
max






k
k
ik
i
z
,
S
P
U
z
,
EU
x
x
|
max
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

14

Bayesian Networks


Aka graphical models, probabilistic networks


Nodes

are hypotheses (random vars) and the prob
corresponds to our belief in the truth of the
hypothesis


Arcs

are direct direct influences between
hypotheses


The
structure

is represented as a directed acyclic
graph (DAG)


The
parameters

are the conditional probs in the
arcs


(Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

15

Causes and Bayes’ Rule

Diagnostic inference:

Knowing that the grass is wet,

what is the probability that rain is

the cause?

causal

diagnostic





















75
0
6
0
2
0
4
0
9
0
4
0
9
0
|~
|
|
|
|
.
.
.
.
.
.
.
R
~
P
R
W
P
R
P
R
W
P
R
P
R
W
P
W
P
R
P
R
W
P
W
R
P









Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

16

Causal vs Diagnostic Inference

Causal inference:

If the

sprinkler is on, what is the

probability that the grass is wet?


P
(
W
|
S
) =
P
(
W
|
R
,
S
)
P
(
R
|
S
) +


P
(
W
|~
R
,
S
)
P
(~
R
|
S
)


=
P
(
W
|
R
,
S
)
P
(
R
) +


P
(
W
|~
R
,
S
)
P
(~
R
)


= 0.95 0.4 + 0.9 0.6 = 0.92



Diagnostic inference:

If the grass is wet, what is the probability

that the sprinkler is on?

P
(
S
|
W
) = 0.35 > 0.2
P
(
S
)

P
(
S
|
R
,
W
) = 0.21

Explaining away:

Knowing that it has rained


decreases the probability that the sprinkler is on.


Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17

Bayesian Networks: Causes

Causal inference:

P
(
W
|
C
) =
P
(
W
|
R
,
S
)
P
(
R
,
S
|
C
) +


P
(
W
|~
R
,
S
)
P
(~
R
,
S
|
C
) +


P
(
W
|
R
,~
S
)
P
(
R
,~
S
|
C
) +


P
(
W
|~
R
,~
S
)
P
(~
R
,~
S
|
C
)


and use the fact that


P
(
R
,
S
|
C
) =
P
(
R
|
C
)
P
(
S
|
C
)



Diagnostic: P
(
C
|
W
) = ?

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

18

Bayesian Nets: Local structure













R
F
P
R
,
S
W
P
C
R
P
C
S
P
C
P
F
,
W
,
R
,
S
,
C
P
|
|
|
|

P
(
F
|
C
) = ?










d
i
i
i
d
X
X
P
X
,
X
P
1
1
parents
|

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

19

Bayesian Networks: Inference

P
(
C,S,R,W,F
) =
P
(
C
)
P
(
S
|
C
)
P
(
R
|
C
)
P
(
W
|
R,S
)
P
(
F
|
R
)


P
(
C,F
) = ∑
S


R


W

P
(
C,S,R,W,F
)


P
(
F
|
C
) =
P
(
C,F
) /
P
(
C
)

Not efficient!


Belief propagation (Pearl, 1988)

Junction trees (Lauritzen and Spiegelhalter, 1988)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

20

Bayesian Networks:
Classification

diagnostic


P
(
C
|
x
)

Bayes’ rule inverts the arc:









x
x
x
p
C
P
C
p
C
P
|
|

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

21

Naive Bayes’ Classifier

Given
C
,
x
j

are independent:



p
(
x
|
C
) =
p
(
x
1
|
C
)
p
(
x
2
|
C
) ...
p
(
x
d
|
C
)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

22

Influence Diagrams

chance node

decision node

utility node

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

23

Association Rules


Association rule:
X



Y


Support

(
X



Y
):





Confidence

(
X



Y
):







customers

and


bought

who
customers
#
Y
X
#
Y
,
X
P

Apriori algorithm (Agrawal et al., 1996)









X
#
Y
X
#
)
X
(
P
Y
,
X
P
X
Y
P

bought

who
customers

and


bought

who
customers
|