Properties of Bayesian networks

Arya MirAI and Robotics

Nov 26, 2011 (5 years and 8 months ago)

605 views

Efficientrepresentationandinference – exploiting dependency structure makes it easier to represent and compute with probabilities • Explainingaway – pattern of probabilistic reasoning characteristic of Bayesian networks, especially early use in AI

Bayesian Networks and
Probabilistic Reasoning
Tuesday, March 21, 2006
Srinandan Dasmahapatra
Rational statistical inference
(Bayes,
Laplace
)
!
"
#
#
#

H
h
h
p
h
d
p
h
p
h
d
p
d
h
p
)
(
)
|
(
)
(
)
|
(
)
|
(
Posterior
probability
Likelihood
Prior
probability
Sum over space
of hypotheses
Why Bayes?

A framework for
explaining
cognition.

How people can learn so much from such limited data.

Why process-level models work the way that they do.

Strong quantitative models with minimal ad hoc assumptions.

A framework for understanding how structured
knowledge and statistical inference interact.

How structured knowledge guides statistical inference, and is
itself acquired through higher-order statistical learning.

How simplicity trades off with fit to the data in evaluating
structural hypotheses (
Occam

s
razor).

How increasingly complex structures may grow as required by
new data, rather than being pre-specified in advance.
© Slides from
Tenenbaum
, Griffiths
Properties of Bayesian
networks

Efficient representation and inference

exploiting dependency structure makes it easier to
represent and compute with probabilities

Explaining away

pattern of probabilistic reasoning characteristic of
Bayesian networks, especially early use in AI
© Slides from
Tenenbaum
, Griffiths
Example:

I'm at work,
neighbour
John calls to say my alarm is
ringing, but
neighbour
Mary doesn't call. Sometimes it's
set off by minor earthquakes. Is there a burglar?

Variables:
Burglary
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls

Network topology reflects "causal" knowledge:

A burglar can set the alarm off

An earthquake can set the alarm off

The alarm can cause Mary to call

The alarm can cause John to call
Example
contd
.
Compactness

A CPT for Boolean
X
i
with
k
Boolean parents has
2
k
rows for the
combinations of parent values

Each row requires one number
p
for
X
i
= true
(the number for
X
i
=
false
is just
1-p
)

If each variable has no more than
k
parents, the complete
network requires
O(n

2
k
)
numbers

I.e., grows linearly with
n
,
vs
.
O(2
n
)

for the full joint distribution

For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (
vs
. 2
5
-1 = 31)
Exploiting Conditional Independence 1:
Joint Distributions in terms of Conditionals
Storage: Exponential in

number of random variables
Bayesian Network Semantics
Network structure to encode (conditional)

independencies

Between variables and this

makes representation compact.
Bayesian Network Semantics - 2
Semantics
The full joint distribution is defined as the product of the local
conditional distributions:
P
(X
1
,

,
X
n
) =
π
i
= 1

P
(X
i
| Parents(X
i
))
e.g.,
P
(j

m

a



b



e)
=
P
(j | a)
P
(m | a)
P
(a |

b,

e)
P
(

b)
P
(

e)

Bayes nets provide a compact way of representing the joint
probability distribution
n
Conditional Independence in a Bayesian
Network
P(A,B|C)=P(A|C)P(B|C)
A is conditionally
i
ndependent of B given C
Or, P(A|B,C)=P(A|C)
No edge ~
Independence!
Constructing Bayesian
networks

1.
Choose an ordering
of variables X
1
,

,
X
n

2. For
i
= 1 to
n

add X
i
to the network

select parents from X
1
,

,X
i-1
such that
P
(X
i
| Parents(X
i
)) =
P
(X
i
| X
1
, ... X
i-1
)
This choice of parents guarantees:
P
(X
1
,

,
X
n
)
=
π
i
=1

P
(X
i
| X
1
,

, X
i-1
) (chain rule)
=
π
i
=1
P
(X
i
| Parents(X
i
)) (by construction)
n
n
Usefulness of Causal Ordering of
Nodes from Roots to Leaves

I'm at work,
neighbour
John calls to say my alarm is
ringing, but
neighbour
Mary doesn't call. Sometimes it's
set off by minor earthquakes. Is there a burglar?

Variables:
Burglary
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls

Network topology reflects "causal" knowledge:

A burglar can set the alarm off

An earthquake can set the alarm off

The alarm can cause Mary to call

The alarm can cause John to call

Suppose we choose the ordering
M, J, A, B, E
P
(J | M) =
P
(J)?
Usefulness of Causal Ordering of
Nodes from Roots to Leaves

Suppose we choose the ordering

M, J, A, B, E
P
(J | M) =
P
(J)?
No
P
(A | J, M) =
P
(A | J)
?

P
(A | J, M) =
P
(A)
?
Causal ordering (not) - cont

d

Suppose we choose the ordering
M, J, A, B, E
P
(J | M) =
P
(J)?
No
P
(A | J, M) =
P
(A | J)
?

P
(A | J, M) =
P
(A)
?
No
P
(B | A, J, M) =
P
(B | A)
?
P
(B | A, J, M) =
P
(B)
?
Causal ordering (not) -- cont

d.

Suppose we choose the ordering
M, J, A, B, E
P
(J | M) =
P
(J)?
No
P
(A | J, M) =
P
(A | J)
?

P
(A | J, M) =
P
(A)
?
No
P
(B | A, J, M) =
P
(B | A)
?
Yes
P
(B | A, J, M) =
P
(B)
?
No
P
(E | B, A ,J, M) =
P
(E | A)
?
P
(E | B, A, J, M) =
P
(E | A, B)
?
Causal ordering (not) - cont

d

Suppose we choose the ordering
M, J, A, B, E
P
(J | M) =
P
(J)?
No
P
(A | J, M) =
P
(A | J)
?

P
(A | J, M) =
P
(A)
?
No
P
(B | A, J, M) =
P
(B | A)
?
Yes
P
(B | A, J, M) =
P
(B)
?
No
P
(E | B, A ,J, M) =
P
(E | A)
?
No
P
(E | B, A, J, M) =
P
(E | A, B)
?
Yes
Causal Ordering (not) - cont

d.

Deciding conditional independence is hard in
noncausal
directions

(Causal models and conditional independence seem hardwired for
humans!)

Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

A more disastrous example is in

Russell and
Norvig
Causal Ordering (not) - cont

d.

Assume grass will be wet if and only if it rained
last night, or if the sprinklers were left on:
Explaining away
Rain
Sprinkler
Grass Wet
.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
)
(
)
(
)
|
(
)
|
(
w
P
r
P
r
w
P
w
r
P

Compute probability it
rained last night, given
that the grass is wet:
.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
!
"
"
"
"
"
"

s
r
s
r
P
s
r
w
P
r
P
r
w
P
w
r
P
,
)
,
(
)
,
|
(
)
(
)
|
(
)
|
(
Compute probability it
rained last night, given
that the grass is wet:
.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
)
,
(
)
,
(
)
,
(
)
(
)
|
(
s
r
P
s
r
P
s
r
P
r
P
w
r
P





Compute probability it
rained last night, given
that the grass is wet:
.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
Compute probability it
rained last night, given
that the grass is wet:
)
,
(
)
(
)
(
)
|
(
s
r
P
r
P
r
P
w
r
P



.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
)
(
)
(
)
(
)
(
)
|
(
s
P
r
P
r
P
r
P
w
r
P



Compute probability it
rained last night, given
that the grass is wet:
Between 1 and
P
(
s
)
)
(
r
P

.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(
© Slides from
Tenenbaum
, Griffiths
Explaining away
Rain
Sprinkler
Grass Wet
Compute probability it
rained last night, given
that the grass is wet
and sprinklers were left
on:
)
|
(
)
|
(
)
,
|
(
)
,
|
(
s
w
P
s
r
P
s
r
w
P
s
w
r
P

Both terms = 1
.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(
Explaining away
Rain
Sprinkler
Grass Wet
Compute probability it
rained last night, given
that the grass is wet
and sprinklers were left
on:
)
(
r
P

)
|
(
)
,
|
(
s
r
P
s
w
r
P

.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(
Explaining away
Rain
Sprinkler
Grass Wet
)
(
r
P

)
|
(
)
,
|
(
s
r
P
s
w
r
P

)
(
)
(
)
(
)
(
)
|
(
s
P
r
P
r
P
r
P
w
r
P



)
(
r
P


Discounting

to
prior probability.
.
and
if
0
s
S
r
R





)
,
|
(
)
(
)
(
)
,
,
(
R
S
W
P
S
P
R
P
W
S
R
P

r
R
s
S
R
S
w
W
P




or
if
1
)
,
|
(

Formulate IF-THEN rules:

IF
Rain
THEN
Wet

IF
Wet
THEN
Rain

Rules do not distinguish directions of inference

Requires combinatorial explosion of rules
Contrast w/ production system
Rain
Grass Wet
Sprinkler
IF
Wet
AND NOT
Sprinkler

THEN
Rain
© Slides from
Tenenbaum
, Griffiths
The Monty Hall problem
Suppose you're on a game show, and you're
given the choice of three doors: Behind one
door is a car; behind the others, goats. You
pick a door, say No. 1, and the host, who
knows what's behind the other doors, opens
another door, say No. 3, which has a goat. He
then says to you, 'Do you want to pick door
No. 2?'
Is it to your advantage to take the switch?
Monty Hall Possible Solution - 1

You have 2 doors left

The car can be behind either door
(events C1 and C2) with equal
probability

Therefore P(C1)=P(C2)=0.5

So, switching doesn

t affect things
Monty Hall (formally)

- 1

Three doors: let D be variable for door behind
which the prize is kept, and a,b,c denote the
values of D

Let MH be the variable denoting Monty Hall

s
opening of a door, with values ma,
mb
, mc.

If you choose door a and
Monty chooses
MH=mb
,
should you then choose
D=a or
D=c
Monty Hall (formally)

- 2

Priors
: P(D=a) = P(
D=b
) =

P(
D=c
) = (1/3)

prize equally likely to
be

behind any of the three doors

Estimate
posterior
P(D|
MH=mb
, a chosen )

Compare the following two (why does the P(D=a), P(
D=c
) not
have

a chosen

as condition?) by taking their ratio:
Conditional probability for Monty Hall

s actions
are
used to get posteriors using Bayes

rule
P(MH=ma|
D=c
, a chosen)=0,
P(
MH=mb
|
D=c
, a chosen)=1,
P(MH=mc|
D=b
, a chosen)=0
P(MH=ma|
D=b
, a chosen)=0,
P(
MH=mb
|
D=b
, a chosen)=0,
P(MH=mc|
D=b
, a chosen)=1
P(MH=ma|D=a, a chosen)=0,
P(
MH=mb
|D=a, a chosen)=0.5,
P(MH=mc|D=a, a chosen)=0.5
Ratio of posteriors using Bayes

rule:
Three Prisoners

One among

three prisoners A, B and C is to be
executed, the others to be released.

Prisoner A asks the jailer if he could pass on a letter to
one of the others who was to be freed.

Several hours
later, prisoner A then asks the jailer who he had given
the letter to.

The jailer informs A that he
had given the letter to B.

What was the

probability for A to be executed after this
message from the jailer that B was to be released? Is it
the same as it was before he heard the jailer

s
reply?

What might the corresponding probabilities have been if
A had asked the jailer

"will B be released?"

(Nice discussion of this

and other conceptual issues in
Pearl

s book.)
Bayes Ball: Checking for
d-separation
or conditional independence

To check if

we need to check if every variable in

A

is
d-
separated
from every variable in

B

conditioned on all
variables in

C

.

Given that all the nodes in

C

are clamped, when we wiggle
nodes

A

can we change any of the node

B

?
Bayes-Ball Algorithm

is a such a
d-separation
test.

shade all nodes

C

place balls at each node in

A

let them bounce around according to some rules

ask if any of the balls reach any of the nodes in

B

(or

A
).

If yes,
Bayes Ball Rules
At the boundaries
Bayes Ball Application
Shaded

node

is
conditioned on, flow of
information indicated by arrows,
indicating correlations
(dependence)
Not only does observation of
(conditioning on) immediate
common effect,
but even the observation of
consequences of the common
immediate effect introduce
correlations between otherwise
independent causes
Coursework (#4)

Use the Bayes Ball rules to see why I
gave the examples I did.

(Not to be submitted)