Bayesian Networks and
Probabilistic Reasoning
Tuesday, March 21, 2006
Srinandan Dasmahapatra
Rational statistical inference
(Bayes,
Laplace
)
!
"
#
#
#
H
h
h
p
h
d
p
h
p
h
d
p
d
h
p
)
(
)

(
)
(
)

(
)

(
Posterior
probability
Likelihood
Prior
probability
Sum over space
of hypotheses
Why Bayes?
•
A framework for
explaining
cognition.
–
How people can learn so much from such limited data.
–
Why processlevel models work the way that they do.
–
Strong quantitative models with minimal ad hoc assumptions.
•
A framework for understanding how structured
knowledge and statistical inference interact.
–
How structured knowledge guides statistical inference, and is
itself acquired through higherorder statistical learning.
–
How simplicity trades off with fit to the data in evaluating
structural hypotheses (
Occam
’
s
razor).
–
How increasingly complex structures may grow as required by
new data, rather than being prespecified in advance.
© Slides from
Tenenbaum
, Griffiths
Properties of Bayesian
networks
•
Efficient representation and inference
–
exploiting dependency structure makes it easier to
represent and compute with probabilities
•
Explaining away
–
pattern of probabilistic reasoning characteristic of
Bayesian networks, especially early use in AI
© Slides from
Tenenbaum
, Griffiths
Example:
•
I'm at work,
neighbour
John calls to say my alarm is
ringing, but
neighbour
Mary doesn't call. Sometimes it's
set off by minor earthquakes. Is there a burglar?
•
Variables:
Burglary
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls
•
Network topology reflects "causal" knowledge:
–
A burglar can set the alarm off
–
An earthquake can set the alarm off
–
The alarm can cause Mary to call
–
The alarm can cause John to call
Example
contd
.
Compactness
•
A CPT for Boolean
X
i
with
k
Boolean parents has
2
k
rows for the
combinations of parent values
•
Each row requires one number
p
for
X
i
= true
(the number for
X
i
=
false
is just
1p
)
•
If each variable has no more than
k
parents, the complete
network requires
O(n
∙
2
k
)
numbers
•
I.e., grows linearly with
n
,
vs
.
O(2
n
)
for the full joint distribution
•
For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (
vs
. 2
5
1 = 31)
Exploiting Conditional Independence 1:
Joint Distributions in terms of Conditionals
Storage: Exponential in
number of random variables
Bayesian Network Semantics
Network structure to encode (conditional)
independencies
Between variables and this
makes representation compact.
Bayesian Network Semantics  2
Semantics
The full joint distribution is defined as the product of the local
conditional distributions:
P
(X
1
,
…
,
X
n
) =
π
i
= 1
P
(X
i
 Parents(X
i
))
e.g.,
P
(j
∧
m
∧
a
∧
b
∧
e)
=
P
(j  a)
P
(m  a)
P
(a 
b,
e)
P
(
b)
P
(
e)
•
Bayes nets provide a compact way of representing the joint
probability distribution
n
Conditional Independence in a Bayesian
Network
P(A,BC)=P(AC)P(BC)
A is conditionally
i
ndependent of B given C
Or, P(AB,C)=P(AC)
No edge ~
Independence!
Constructing Bayesian
networks
•
1.
Choose an ordering
of variables X
1
,
…
,
X
n
•
2. For
i
= 1 to
n
–
add X
i
to the network
–
select parents from X
1
,
…
,X
i1
such that
P
(X
i
 Parents(X
i
)) =
P
(X
i
 X
1
, ... X
i1
)
This choice of parents guarantees:
P
(X
1
,
…
,
X
n
)
=
π
i
=1
P
(X
i
 X
1
,
…
, X
i1
) (chain rule)
=
π
i
=1
P
(X
i
 Parents(X
i
)) (by construction)
n
n
Usefulness of Causal Ordering of
Nodes from Roots to Leaves
•
I'm at work,
neighbour
John calls to say my alarm is
ringing, but
neighbour
Mary doesn't call. Sometimes it's
set off by minor earthquakes. Is there a burglar?
•
Variables:
Burglary
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls
•
Network topology reflects "causal" knowledge:
–
A burglar can set the alarm off
–
An earthquake can set the alarm off
–
The alarm can cause Mary to call
–
The alarm can cause John to call
•
Suppose we choose the ordering
M, J, A, B, E
P
(J  M) =
P
(J)?
Usefulness of Causal Ordering of
Nodes from Roots to Leaves
•
Suppose we choose the ordering
•
M, J, A, B, E
P
(J  M) =
P
(J)?
No
P
(A  J, M) =
P
(A  J)
?
P
(A  J, M) =
P
(A)
?
Causal ordering (not)  cont
’
d
•
Suppose we choose the ordering
M, J, A, B, E
P
(J  M) =
P
(J)?
No
P
(A  J, M) =
P
(A  J)
?
P
(A  J, M) =
P
(A)
?
No
P
(B  A, J, M) =
P
(B  A)
?
P
(B  A, J, M) =
P
(B)
?
Causal ordering (not)  cont
’
d.
•
Suppose we choose the ordering
M, J, A, B, E
P
(J  M) =
P
(J)?
No
P
(A  J, M) =
P
(A  J)
?
P
(A  J, M) =
P
(A)
?
No
P
(B  A, J, M) =
P
(B  A)
?
Yes
P
(B  A, J, M) =
P
(B)
?
No
P
(E  B, A ,J, M) =
P
(E  A)
?
P
(E  B, A, J, M) =
P
(E  A, B)
?
Causal ordering (not)  cont
’
d
•
Suppose we choose the ordering
M, J, A, B, E
P
(J  M) =
P
(J)?
No
P
(A  J, M) =
P
(A  J)
?
P
(A  J, M) =
P
(A)
?
No
P
(B  A, J, M) =
P
(B  A)
?
Yes
P
(B  A, J, M) =
P
(B)
?
No
P
(E  B, A ,J, M) =
P
(E  A)
?
No
P
(E  B, A, J, M) =
P
(E  A, B)
?
Yes
Causal Ordering (not)  cont
’
d.
•
Deciding conditional independence is hard in
noncausal
directions
•
(Causal models and conditional independence seem hardwired for
humans!)
•
Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed
•
A more disastrous example is in
Russell and
Norvig
Causal Ordering (not)  cont
’
d.
•
Assume grass will be wet if and only if it rained
last night, or if the sprinklers were left on:
Explaining away
Rain
Sprinkler
Grass Wet
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
)
(
)
(
)

(
)

(
w
P
r
P
r
w
P
w
r
P
Compute probability it
rained last night, given
that the grass is wet:
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
!
"
"
"
"
"
"
s
r
s
r
P
s
r
w
P
r
P
r
w
P
w
r
P
,
)
,
(
)
,

(
)
(
)

(
)

(
Compute probability it
rained last night, given
that the grass is wet:
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
)
,
(
)
,
(
)
,
(
)
(
)

(
s
r
P
s
r
P
s
r
P
r
P
w
r
P
Compute probability it
rained last night, given
that the grass is wet:
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
Compute probability it
rained last night, given
that the grass is wet:
)
,
(
)
(
)
(
)

(
s
r
P
r
P
r
P
w
r
P
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
)
(
)
(
)
(
)
(
)

(
s
P
r
P
r
P
r
P
w
r
P
Compute probability it
rained last night, given
that the grass is wet:
Between 1 and
P
(
s
)
)
(
r
P
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
Compute probability it
rained last night, given
that the grass is wet
and sprinklers were left
on:
)

(
)

(
)
,

(
)
,

(
s
w
P
s
r
P
s
r
w
P
s
w
r
P
Both terms = 1
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
Explaining away
Rain
Sprinkler
Grass Wet
Compute probability it
rained last night, given
that the grass is wet
and sprinklers were left
on:
)
(
r
P
)

(
)
,

(
s
r
P
s
w
r
P
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
Explaining away
Rain
Sprinkler
Grass Wet
)
(
r
P
)

(
)
,

(
s
r
P
s
w
r
P
)
(
)
(
)
(
)
(
)

(
s
P
r
P
r
P
r
P
w
r
P
)
(
r
P
“
Discounting
”
to
prior probability.
.
and
if
0
s
S
r
R
)
,

(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P
r
R
s
S
R
S
w
W
P
or
if
1
)
,

(
•
Formulate IFTHEN rules:
–
IF
Rain
THEN
Wet
–
IF
Wet
THEN
Rain
•
Rules do not distinguish directions of inference
•
Requires combinatorial explosion of rules
Contrast w/ production system
Rain
Grass Wet
Sprinkler
IF
Wet
AND NOT
Sprinkler
THEN
Rain
© Slides from
Tenenbaum
, Griffiths
The Monty Hall problem
Suppose you're on a game show, and you're
given the choice of three doors: Behind one
door is a car; behind the others, goats. You
pick a door, say No. 1, and the host, who
knows what's behind the other doors, opens
another door, say No. 3, which has a goat. He
then says to you, 'Do you want to pick door
No. 2?'
Is it to your advantage to take the switch?
Monty Hall Possible Solution  1
•
You have 2 doors left
•
The car can be behind either door
(events C1 and C2) with equal
probability
•
Therefore P(C1)=P(C2)=0.5
•
So, switching doesn
’
t affect things
Monty Hall (formally)
 1
•
Three doors: let D be variable for door behind
which the prize is kept, and a,b,c denote the
values of D
•
Let MH be the variable denoting Monty Hall
’
s
opening of a door, with values ma,
mb
, mc.
•
If you choose door a and
Monty chooses
MH=mb
,
should you then choose
D=a or
D=c
Monty Hall (formally)
 2
•
Priors
: P(D=a) = P(
D=b
) =
P(
D=c
) = (1/3)
prize equally likely to
be
behind any of the three doors
•
Estimate
posterior
P(D
MH=mb
, a chosen )
•
Compare the following two (why does the P(D=a), P(
D=c
) not
have
“
a chosen
”
as condition?) by taking their ratio:
Conditional probability for Monty Hall
’
s actions
are
used to get posteriors using Bayes
’
rule
P(MH=ma
D=c
, a chosen)=0,
P(
MH=mb

D=c
, a chosen)=1,
P(MH=mc
D=b
, a chosen)=0
P(MH=ma
D=b
, a chosen)=0,
P(
MH=mb

D=b
, a chosen)=0,
P(MH=mc
D=b
, a chosen)=1
P(MH=maD=a, a chosen)=0,
P(
MH=mb
D=a, a chosen)=0.5,
P(MH=mcD=a, a chosen)=0.5
Ratio of posteriors using Bayes
’
rule:
Three Prisoners
•
One among
three prisoners A, B and C is to be
executed, the others to be released.
•
Prisoner A asks the jailer if he could pass on a letter to
one of the others who was to be freed.
Several hours
later, prisoner A then asks the jailer who he had given
the letter to.
•
The jailer informs A that he
had given the letter to B.
•
What was the
probability for A to be executed after this
message from the jailer that B was to be released? Is it
the same as it was before he heard the jailer
’
s
reply?
•
What might the corresponding probabilities have been if
A had asked the jailer
"will B be released?"
•
(Nice discussion of this
and other conceptual issues in
Pearl
’
s book.)
Bayes Ball: Checking for
dseparation
or conditional independence
•
To check if
we need to check if every variable in
A
is
d
separated
from every variable in
B
conditioned on all
variables in
C
.
•
Given that all the nodes in
C
are clamped, when we wiggle
nodes
A
can we change any of the node
B
?
BayesBall Algorithm
is a such a
dseparation
test.
•
shade all nodes
C
•
place balls at each node in
A
•
let them bounce around according to some rules
•
ask if any of the balls reach any of the nodes in
B
(or
A
).
•
If yes,
Bayes Ball Rules
At the boundaries
Bayes Ball Application
Shaded
node
is
conditioned on, flow of
information indicated by arrows,
indicating correlations
(dependence)
Not only does observation of
(conditioning on) immediate
common effect,
but even the observation of
consequences of the common
immediate effect introduce
correlations between otherwise
independent causes
Coursework (#4)
•
Use the Bayes Ball rules to see why I
gave the examples I did.
•
(Not to be submitted)
Comments 0
Log in to post a comment