Bayesian Networks
Aldi Kraja
Division of Statistical Genomics
Bayesian Networks and Decision
Graphs. Chapter 1
•
Causal networks are a set of variables
and a set of directed links between
variables
•
Variables represent events (propositions)
•
A variable can have any number of states
•
Purpose: Causal networks can be used to
follow how a change of certainty in one
variable may change certainty of other
variables
Causal networks
Fuel
Fuel Meter
Standing
F,
½,
E
Start
Y,
N
Y,
N
Clean Sparks
Y,
N
Causal Network for a reduced start car problem
Causal Networks and
d

separation
•
Serial connection (blocking)
A
B
C
Evidence maybe transmitted through a serial connection
unless the state of the variable in the connection is known.
A and C and are d

separated given B
When B is instantiated it blocks the communication between A and C
Causal networks and
d

separation
•
Diverging connections (Blocking)
A
B
C
E
…
Influence can pass between all children of A unless the state of A is known
Evidence may be transmitted through a diverging connection
unless it is instantiated.
Causal networks and
d

separation
•
Converging connections (opening)
A
B
C
E
…
Case1: If nothing is known about A,
except inference from knowledge of its parents => then parents are independent
Evidence on one of the parents has no influence on other parents
Case 2: If anything is known about the consequences, then information in one
may tell us something about the other causes. (Explaining away effect)
Evidence may only
be transmitted through
the converging connection
If either A or one of its
descendants has
received evidence
Evidence
•
Evidence on a variable is a statement of
the certainties of its states
•
If the variable is instantiated then the
variable provides hard evidence
•
Blocking in the case of serial and
diverging connections requires hard
evidence
•
Opening in the case of converging
connections holds for all kind of evidence
D

separation
•
Two distinct variables A and B in a causal
network are d

separated if, for all paths between
A and B there is an intermediate variable V
(distinct from A and B) such that:
•

The connection is SERIAL or DIVERGING and
V is instantiated
•
Or
•

the connection is CONVERGING and neither
V nor any of V’s descendants have received
evidence
Probability Theory
•
The uncertainty raises from noise in the
measurements and from the small sample
size in the data.
•
Use probability theory to quantify the
uncertainty.
P(B=r)=4/10
P(B=g)=6/10
ripe
Wheat
unripe
Wheat
Red
fungus
Gray
fungus
Probability Theory
•
The probability of an event is
the fraction
of times that event occurs out of the total
number of trails
, in the limit that the total
number of trails goes to infinity
Probability Theory
•
Sum rule:
•
Product rule
Y
Y
X
p
X
p
)
,
(
)
(
)
(
)

(
)
,
(
X
p
X
Y
p
Y
X
p
i=1
……
M
j=1
……
L
n
ij
Y=y
i
X=x
i
c
i
r
j
Probability Theory
)
(
)

(
)
,
(
)
,
(
)
(
,
)
(
)
,
(
1
i
i
j
i
i
ij
ij
j
i
L
j
j
i
i
j
ij
i
i
i
ij
i
i
x
X
p
x
X
y
Y
p
N
c
c
n
N
n
y
Y
x
X
p
y
Y
x
X
p
x
X
p
n
c
where
N
c
x
X
p
N
n
y
Y
x
X
p
Y
Y
X
p
X
p
)
,
(
)
(
i=1
……
M
j=1
……
L
n
ij
Y=y
i
X=x
i
c
i
r
j
)
(
)

(
)
,
(
X
p
X
Y
p
Y
X
p
Probability Theory
•
Symmetry property
)
(
)
(
)
,
(
:
'
)
(
)
(
)

(
)

(
)
(
)

(
)
(
)

(
)
,
(
)
,
(
Y
p
X
p
Y
X
p
case
Special
theorem
s
Baye
X
p
Y
p
Y
X
p
X
Y
p
Y
p
Y
X
p
X
p
X
Y
p
X
Y
p
Y
X
p
Probability Theory
•
P(W=u  F=R)=8/32=1/4
•
P(W=r  F=R)=24/32=3/4
•
P(W=u  F=G)=18/24=3/4
•
P(W=r  F=G)=6/24=1/4
P(F=R)=4/10=0.4
P(F=G)=6/10
=0.6
unripe
Wheat
Gray
fungus
Red
fungus
ripe
Wheat
1
1
Probability Theory
•
p(W=u)=p(W=uF=R)p(F=R)+p(W=uF=G)p(F=G)
=1/4*4/10+3/4*6/10=11/20
•
p(W=r)=1

11/20=9/20
•
p(F=RW=r)=(p(W=rF=R)p(F=R)/p(W=r))=
•
3/4*4/10*20/9=2/3
•
P(F=GW=u)=1

2/3=1/3
P(F=R)=4/10=0.4
P(F=G)=6/10
=0.6
unripped
Wheat
Gray
fungus
Red
fungus
ripe
Wheat
Conditional probabilities
•
Convergence connection (blocking)
•
p(ab)p(b)=p(a,b)
•
p(ab,c)p(bc)=p(a,bc)
•
p(ba)=p(ab)p(b)/p(a)
•
p(ba,c)=p(ab,c)p(bc)/p(ac)
b
a
c
p(a,b,c)=p(ab)p(cb)p(b)
b
a
c
p(a,b,c)/p(b)=p(ab)p(cb)p(b)/p(b)
a
╨c  b
Conditional probabilities
•
Serial connection (blocking)
•
p(ab)p(b)=p(a,b)
•
p(ab,c)p(bc)=p(a,bc)
•
p(ba)=p(ab)p(b)/p(a)
•
p(ba,c)=p(ab,c)p(bc)/p(ac)
b
a
c
b
a
c
p(a,b,c)=p(a)p(ba)p(cb)
p(a,cb)=p(a,b,c)/p(b)= p(a)p(ba)p(cb)/p(b)=
p(a) {p(ab)p(b)/p(a)} p(cb)/p(b)=p(ab)p(cb)
a
╨c  b
Conditional probabilities
•
Convergence connection (opening)
•
p(ab)p(b)=p(a,b)
•
p(ab,c)p(bc)=p(a,bc)
•
p(ba)=p(ab)p(b)/p(a)
•
p(ba,c)=p(ab,c)p(bc)/p(ac)
b
a
c
b
a
c
p(a,b,c)=p(a)p(c)p(ba,c)
p(a,cb)=p(a,b,c)/p(b)= p(a)p(c)p(ba,c)/p(b)
a
╨c  0
a
╨c  b
Graphical Models
•
We need probability
theory to quantify the
uncertainty. All the
probabilistic inference
can be expressed
with
the sum
and the
product
rule.
p(a,b,c)=p(ca,b)p(a,b)
p(a,b,c)=p(ca,b)p(ba)p(a)
a
c
b
DAG
P(x
1
,x
2
,….,x
K

1
,x
K
)=p(x
K
x
1
,...,x
K

1
)…p(x
2
x
1
)p(x
1
)
Graphical Models
•
DAG explaining joint distribution of x
1
,…x
7
•
The joint distribution defined by a graph is given by the product, over all of the nodes
of a graph, of a conditional distribution of each node conditioned on the variables
corresponding to the parents of that node in the graph.
)

(
)

(
)
,

(
)
,
,

(
)
(
)
(
)
(
)
,...,
(
5
7
4
6
3
1
5
3
2
1
4
3
2
1
7
1
x
x
p
x
x
p
x
x
x
p
x
x
x
x
p
x
p
x
p
x
p
x
x
p
x
1
x
2
x
3
x
4
x
5
x
6
x
7
K
k
k
k
pa
x
p
x
p
1
)

(
)
(
Comments 0
Log in to post a comment