# Slides

Τεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 4 χρόνια και 6 μήνες)

102 εμφανίσεις

QUIZ!!

T/F:
You can always (theoretically) do BNs inference by enumeration.
TRUE

T/F:
In VE, always first marginalize, then join.
FALSE

T/F:
VE is a lot faster than enumeration, but not always exact.
FALSE

T/F:
The running time of VE is independent of the order you pick the r.v.s.
FALSE

T/F:
The more evidence, the faster VE runs.
TRUE

P(X|Y) sums to ... |Y|

P(x|Y) sums to ... ??

P(X|y) sums to ... 1

P(x|y) sums to ... P(x|y)

P(x,Y) sums to .. P(x)

P(X,Y) sums to ... 1

1

CSE
511
a: Artificial Intelligence

Spring
2013

Lecture 17: Bayes

Nets IV

Approximate
Inference (Sampling)

04/01/2013

Robert Pless

Via Kilian Weinberger via Dan Klein

UC Berkeley

Announcements

Project 4 out soon.

Project 3 due at midnight.

3

Exact Inference

Variable Elimination

4

General Variable Elimination

Query:

Local CPTs (but instantiated by evidence)

While there are still hidden variables (not Q or evidence):

Pick a hidden variable H

Join all factors mentioning H

Eliminate (sum out) H

Join all remaining factors and normalize

5

6

7

Example: Variable elimination

Query:

What is the probability that
a student attends class, given that
they pass the exam?

[based on slides taken from UMBC CMSC
671
,
2005
]

P(
pr
|at,
st
)

at

st

0.9

T

T

0.5

T

F

0.7

F

T

0.1

F

F

attend

study

prepared

fair

pass

P(at)=.8

P(
st
)=.6

P(
fa
)=.
9

P(
pa
|at,
pre,fa
)

pr

at

fa

0.9

T

T

T

0.1

T

T

F

0.7

T

F

T

0.1

T

F

F

0.7

F

T

T

0.1

F

T

F

0.2

F

F

T

0.1

F

F

F

8

Join
study

factors

attend

study

prepared

fair

pass

P(at)=.8

P(
st
)=.6

P(
fa
)=.9

Original

Joint

Marginal

prep

study

attend

P(
pr
|at,
st
)

P(st)

P(
pr,st|sm
)

P(
pr|sm
)

T

T

T

0.9

0.6

0.54

0.74

T

F

T

0.5

0.4

0.2

T

T

F

0.7

0.6

0.42

0.46

T

F

F

0.1

0.4

0.04

F

T

T

0.1

0.6

0.06

0.26

F

F

T

0.5

0.4

0.2

F

T

F

0.3

0.6

0.18

0.54

F

F

F

0.9

0.4

0.36

P(
pa
|at,
pre,fa
)

pr

at

fa

0.9

T

T

T

0.1

T

T

F

0.7

T

F

T

0.1

T

F

F

0.7

F

T

T

0.1

F

T

F

0.2

F

F

T

0.1

F

F

F

9

Marginalize out
study

attend

prepared,

study

fair

pass

P(at)=.8

P(
fa
)=.9

Original

Joint

Marginal

prep

study

attend

P(
pr
|at,
st
)

P(st)

P(
pr,st
|ac
)

P(
pr
|at
)

T

T

T

0.9

0.6

0.54

0.74

T

F

T

0.5

0.4

0.2

T

T

F

0.7

0.6

0.42

0.46

T

F

F

0.1

0.4

0.04

F

T

T

0.1

0.6

0.06

0.26

F

F

T

0.5

0.4

0.2

F

T

F

0.3

0.6

0.18

0.54

F

F

F

0.9

0.4

0.36

P(
pa
|at,
pre,fa
)

pr

at

fa

0.9

T

T

T

0.1

T

T

F

0.7

T

F

T

0.1

T

F

F

0.7

F

T

T

0.1

F

T

F

0.2

F

F

T

0.1

F

F

F

10

Remove “study”

attend

prepared

fair

pass

P(at)=.8

P(
fa
)=.9

P(
pr
|at
)

pr

at

0.74

T

T

0.46

T

F

0.26

F

T

0.54

F

F

P(
pa
|at,
pre,fa
)

pr

at

fa

0.9

T

T

T

0.1

T

T

F

0.7

T

F

T

0.1

T

F

F

0.7

F

T

T

0.1

F

T

F

0.2

F

F

T

0.1

F

F

F

11

Join factors “
fair

attend

prepared

fair

pass

P(at)=.
8

P(
fa
)=.9

P(
pr
|at
)

prep

attend

0.74

T

T

0.46

T

F

0.26

F

T

0.54

F

F

Original

Joint

Marginal

pa

pre

attend

fair

P(
pa
|at,
pre,
fa
)

P(fair)

P(
pa,fa|sm,
pre
)

P(
pa|sm,pre
)

t

T

T

T

0.9

0.9

0.81

0.82

t

T

T

F

0.1

0.1

0.01

t

T

F

T

0.7

0.9

0.63

0.64

t

T

F

F

0.1

0.1

0.01

t

F

T

T

0.7

0.9

0.63

0.64

t

F

T

F

0.1

0.1

0.01

t

F

F

T

0.2

0.9

0.18

0.19

t

F

F

F

0.1

0.1

0.01

12

Marginalize out “
fair

attend

prepared

pass,

fair

P(at)=.8

P(
pr
|at
)

prep

attend

0.74

T

T

0.46

T

F

0.26

F

T

0.54

F

F

Original

Joint

Marginal

pa

pre

attend

fair

P(
pa
|at,
pre,fa
)

P(fair)

P(
pa,fa
|at,
pre
)

P(
pa
|at,
pre
)

T

T

T

T

0.9

0.9

0.81

0.82

T

T

T

F

0.1

0.1

0.01

T

T

F

T

0.7

0.9

0.63

0.64

T

T

F

F

0.1

0.1

0.01

T

F

T

T

0.7

0.9

0.63

0.64

T

F

T

F

0.1

0.1

0.01

T

F

F

T

0.2

0.9

0.18

0.19

T

F

F

F

0.1

0.1

0.01

13

Marginalize out “
fair

attend

prepared

pass

P(at)=.8

P(
pr
|at
)

prep

attend

0.74

T

T

0.46

T

F

0.26

F

T

0.54

F

F

P(
pa
|at,
pre
)

pa

pre

attend

0.82

t

T

T

0.64

t

T

F

0.64

t

F

T

0.19

t

F

F

14

Join factors “
prepared

attend

prepared

pass

P(at)=.8

Original

Joint

Marginal

pa

pre

attend

P(
pa
|at,pr
)

P(
pr
|at
)

P(
pa,pr|sm
)

P(
pa|sm
)

t

T

T

0.82

0.74

0.6068

0.7732

t

T

F

0.64

0.46

0.2944

0.397

t

F

T

0.64

0.26

0.1664

t

F

F

0.19

0.54

0.1026

15

Join factors “
prepared

attend

pass,

prepared

P(at)=.8

Original

Joint

Marginal

pa

pre

attend

P(
pa
|at,pr
)

P(
pr
|at
)

P(
pa,pr
|at
)

P(
pa
|at
)

t

T

T

0.82

0.74

0.6068

0.7732

t

T

F

0.64

0.46

0.2944

0.397

t

F

T

0.64

0.26

0.1664

t

F

F

0.19

0.54

0.1026

16

Join factors “
prepared

attend

pass

P(at)=.8

P(
pa
|at
)

pa

attend

0.7732

t

T

0.397

t

F

17

Join factors

attend

pass

P(at)=.8

Original

Joint

Normalized:

pa

attend

P(
pa
|at
)

P
(at)

P(
pa,sm
)

P
(
at|
pa
)

T

T

0.7732

0.8

0.61856

0.89

T

F

0.397

0.2

0.0794

0.11

18

Join factors

attend,

pass

Original

Joint

Normalized:

pa

attend

P(
pa
|at
)

P
(at)

P(
pa
,at
)

P
(
at|
pa
)

T

T

0.7732

0.8

0.61856

0.89

T

F

0.397

0.2

0.0794

0.11

Approximate Inference

Sampling (particle based method)

19

Approximate Inference

20

Sampling

the basics ...

Scrooge McDuck gives
you an ancient coin.

He wants to know what
is P(H)

You have no homework,
and nothing good is on
television

so you toss it
1
Million times.

You obtain
700000
x
300000
x
Tails.

What is P(H)?

21

Sampling

the basics ...

Exactly, P(H)=0.7

Why?

22

Monte Carlo Method

23

Who is more likely to win? Green or
Purple?

What is the probability that green
wins, P(G)?

Two ways to solve this:

1.
Compute the exact probability.

2.
Play 100,000 games and see
how many times green wins.

Approximate Inference

Simulation has a name: sampling

Sampling is a hot topic in machine learning,

and it

s really simple

Basic idea:

Draw N samples from a sampling distribution S

Compute an approximate posterior probability

Show this converges to the true probability P

Why sample?

Learning: get samples from a distribution you don

t know

Inference: getting a sample is faster than computing the right
answer (e.g. with variable elimination)

24

S

A

F

Forward Sampling

Cloudy

Sprinkler

Rain

WetGrass

Cloudy

Sprinkler

Rain

WetGrass

25

+c

0.5

-
c

0.5

+c

+s

0.1

-
s

0.9

-
c

+s

0.5

-
s

0.5

+c

+r

0.8

-
r

0.2

-
c

+r

0.2

-
r

0.8

+s

+r

+w

0.99

-
w

0.01

-
r

+w

0.90

-
w

0.10

-
s

+r

+w

0.90

-
w

0.10

-
r

+w

0.01

-
w

0.99

Samples:

+c,
-
s, +r, +w

-
c, +s,
-
r, +w

[Excel Demo]

Forward Sampling

This process generates samples with probability:

…i.e. the BN

s joint probability

Let the number of samples of an event be

Then

I.e., the sampling procedure is
consistent

26

Example

We

ll get a bunch of samples from the BN:

+c,
-
s, +r, +w

+c, +s, +r, +w

-
c, +s, +r,
-
w

+c,
-
s, +r, +w

-
c,
-
s,
-
r, +w

If we want to know P(W)

We have counts <+w:
4
,
-
w:
1
>

Normalize to get P(W) =
<+w:
0.8
,
-
w:
0.2
>

This will get closer to the true distribution with more samples

Can estimate anything else, too

What about P(C| +w
)?
P(C| +
r,
+w
)?
P(C|
-
r,
-
w
)?

Fast: can use fewer samples if less time (what

s the drawback?)

Cloudy

Sprinkler

Rain

WetGrass

C

S

R

W

27

Rejection Sampling

Let

s say we want P(C)

No point keeping all samples around

Just tally counts of C as we go

Let

s say we want
P(C| +s
)

Same thing: tally C outcomes, but
ignore (reject) samples which don

t
have S=+s

This is called rejection sampling

It is also consistent for conditional
probabilities (i.e., correct in the limit)

+c,
-
s, +r, +w

+c, +s, +r, +w

-
c, +s, +r,
-
w

+c,
-
s, +r, +w

-
c,
-
s,
-
r, +w

Cloudy

Sprinkler

Rain

WetGrass

C

S

R

W

28

Sampling Example

There are 2 cups.

The first contains 1 penny and 1 quarter

The second contains 2 quarters

Say I pick a cup uniformly at random, then pick a
coin randomly from that cup. It's a quarter (yes!).
What is the probability that the other coin in that
cup is also a quarter?

Likelihood Weighting

Problem with rejection sampling:

If evidence is unlikely, you reject a lot of samples

You don

t exploit your evidence as you sample

Consider P(B|+a)

Idea: fix evidence variables and sample the rest

Problem: sample distribution not consistent!

Solution: weight by probability of evidence given parents

Burglary

Alarm

Burglary

Alarm

30

-
b,
-
a

-
b,
-
a

-
b,
-
a

-
b,
-
a

+b, +a

-
b +a

-
b, +a

-
b, +a

-
b, +a

+b, +a

Likelihood Weighting

Sampling distribution if z sampled and e fixed evidence

Now, samples have weights

Together, weighted sampling distribution is consistent

Cloudy

R

C

S

W

31

Likelihood Weighting

32

+c

0.5

-
c

0.5

+c

+s

0.1

-
s

0.9

-
c

+s

0.5

-
s

0.5

+c

+r

0.8

-
r

0.2

-
c

+r

0.2

-
r

0.8

+s

+r

+w

0.99

-
w

0.01

-
r

+w

0.90

-
w

0.10

-
s

+r

+w

0.90

-
w

0.10

-
r

+w

0.01

-
w

0.99

Samples:

+c, +s, +r, +w

Cloudy

Sprinkler

Rain

WetGrass

Cloudy

Sprinkler

Rain

WetGrass

Inference:

Sum over weights that match query value

Divide by total sample weight

What is P(C|+
w,+r
)?

Likelihood Weighting Example

33

Cloudy

Rainy

Sprinkler

Wet Grass

Weight

0

1

1

1

0.495

0

0

1

1

0.45

0

0

1

1

0.45

0

0

1

1

0.45

1

0

1

1

0.09

Likelihood Weighting

Likelihood weighting is good

We have taken evidence into account as
we generate the sample

E.g. here, W

s value will get picked
based on the evidence values of S, R

More of our samples will reflect the state
of the world suggested by the evidence

Likelihood weighting doesn

t solve
all our problems

Evidence influences the choice of
downstream variables, but not upstream
ones (C isn

t more likely to get a value
matching the evidence)

We would like to consider evidence
when we sample every variable

34

Cloudy

Rain

C

S

R

W

Markov Chain Monte Carlo*

Idea
: instead of sampling from scratch, create samples
that are each like the last one.

Procedure
: resample one variable at a time, conditioned
on all the rest, but keep evidence fixed. E.g., for P(b|c):

Properties
: Now samples are not independent (in fact
they

re nearly identical), but sample averages are still
consistent estimators!

What

s the point
: both upstream and downstream
variables condition on evidence.

35

+a

+c

+b

+a

+c

-
b

-
a

+c

-
b

Random Walks

[Explain on Blackboard]

36

Gibbs Sampling

1.
Set all evidence E to e

2.
Do forward sampling t obtain x
1
,...,x
n

3.
Repeat:

1.
Pick any variable X
i

uniformly at random.

2.
Resample x
i
’ from p(X
i

| x
1
,..., x
i
-
1
, x
i+1
,..., x
n
)

3.
Set all other x
j
’=x
j

4.
The new sample is x
1’
,..., x
n

37

Markov Blanket

38

X

Markov blanket of X:

1.
All parents of X

2.
All children of X

3.
All parents of children of X
(except X itself)

X is conditionally independent from
all

other
variables in the BN, given all variables in the
markov blanket (besides X).

Gibbs Sampling

1.
Set all evidence E to e

2.
Do forward sampling t obtain x
1
,...,x
n

3.
Repeat:

1.
Pick any variable X
i

uniformly at random.

2.
Resample x
i
’ from p(X
i

|

markovblanket(X
i
))

3.
Set all other x
j
’=x
j

4.
The new sample is x
1

,..., x
n

39

Summary

Sampling can be your salvation

The dominating approach to inference in
BNs

Approaches:

Forward (/Prior) Sampling

Rejection Sampling

Likelihood Weighted Sampling

Gibbs Sampling

40

Learning in Bayes Nets

Given

the network structure and
given

data, where a data point is an
observed setting for the variables,
learn

the CPTs for the Bayes Net. Might also