Bayesian Networks and Markov Random Fields

AI and Robotics

Nov 7, 2013 (4 years and 8 months ago)

135 views

Bayesian Networks and Markov Random Fields
1 Bayesian Networks
As before,we will use capital letters for random variables and lower case letters
for values of those variables.A Bayesian network is a triple hV;G;Pi where V is
a set of randomvariables X
1
,:::,X
n
,Gis a directed acyclic graph (DAG) whose
nodes are the variables in V,and P is a set of conditional probability tables
as described below.The conditional probability tables determine a probability
distribution over the values of the variables.If there is a directed edge in G
from X
j
to X
i
then we will say that X
j
is a parent of X
i
and X
i
is a child of
X
j
.To determine values for the variables one ¯rst selects values for variables
that have no parents and then repeatedly picks a value for any node all of whose
parents already have values.When we pick a value for a variable we look only
at the values of the parents of that variable.We will write P(x
i
j parents of x
i
)
to abbreviate P(x j x
i
1
;:::x
i
k
) where x
i
1
;:::;x
i
k
are the parents of x.For
example,if x
7
has parents x
2
and x
4
then P(x
7
j parents of x
7
) abbreviates
P(x
7
j x
2
;x
4
).Formally,the probability distribution on the variables of a
Bayesian network is determined by the following equation.
P(x
1
;:::;x
n
) = ¦
n
i=1
P(x
i
j parents of x
i
) (1)
The conditional probabilities of the form P(x
i
j parents of x
i
) are called
conditional probability tables (CPTs).Suppose that each of the variables x
i
has d possible values (this is not required in general).In this case if a variable x
has k parents then P(x j parents of x) has d
k+1
values (with (d¡1)d
k
degrees of
freedom).These d
k+1
values can be stored in a table with k +1 indeces.Hence
the term\table".Note that the number of indeces of the CPTs (conditional
probability tables) is di®erent for the di®erent variables.
Note that an HMM is a Bayesian network with a variable for each hidden
state and each observable token.
Bayesian networks are often used in medical diagnosis where variables rep-
resent the presence or absence of certain diseases or the presence o absence
of certain measurable quantities such as blood sugar or presence of a certain
protein in the blood.
In a Bayesian network the edges of the directed graph are often interpreted
as\causation"with the parent causally in°uencing the child and parents getting
assigned values temporally before children.
We are interested in\Bayesian inference"which means,intuitively,inferring
1
causes by observing their e®ects using Bayes'rule.In an HMMfor example,we
want to infer the hidden states from the observations that they cause.
In general we can formulate the inference problem as the problem of deter-
mining the probability of an unknown variable (a hidden cause) from observed
values of other variables.In general we can consider the variables in any order.
P(x
5
;x
7
j x
3
;x
2
) =
P(x
5
;x
7
;x
2
;x
3
)
P(x
2
;x
3
)
So in general,for inference it su±ces to be able to compute probabilities of
the form P(x
i
1
;:::;x
i
k
).We give an algorithm for doing this in section 3.
2 Inference in Bayesian Network is#P hard
A Boolean variable (also called Bernoulli variable) is a variable that has only
the two possible values of 0 or 1.A disjunctive clause is a disjunction of literals
where each literal is either a Boolean variable or the negation of a Boolean
variable.For example we have that (X
5
_:X
2
3 _ X
3
7) is a clause with three
literals.A 3SAT problem is a set of clauses with three literals in each clause.It
is hard (in fact#P hard) to take a 3SAT problem and determine the number
of complete assignments of values to variables that satisfy the clauses,i.e.,that
make every clause true.
Take X
1
,:::,X
n
be independent Boolean (Bernoulli) variables with P(X
i
=
1) = 1=2.Let § be a set of clauses over these variables where each clause has
only three literals.Let C
j
be a random variable which is 1 if the jth clause
is satis¯ed.Since we can compute c
j
from x
1
,:::,x
n
using a (deterministic)
conditional probability table having only three parents.Let A
j
be a Boolean
variable that is true if all of C
1
,:::,C
j
are true.a
1
can be computed with a
CPT from c
1
and for j > 1 we have that a
j
can be computed using a CPT from
a
j¡1
and c
j
.
Now we have that P(A
k
= 1) is proportional to the number of truth assign-
ments that satisfying all the clauses.This implies that computing the proba-
bility of a partial assignment in a Bayesian network is#P hard.It is widely
believed that there are no polynomial time algorithms for#P hard problems.
2
3 Recursive Conditioning
Although inference in Bayesian networks is hard in general,there exist algo-
rithms that work well when the Bayesian network has special structure.
² D(X) is the set of values that variable X can have.
² ½ ranges over partial assignments of values to variables |½ assigns values
to some of the variables in V.
² dom(½) is the set of variables that are assigned values by ½.For X 2
dom(½) we have that ½(X) 2 D(X).
² For X 62 dom(½),and x 2 D(X) we let ½[X:= x] be the extension of ½
that assigns X the value x.
² ¾ ranges over total assignments of values to the variables in V.For X 2 V
we have ¾(X) 2 D(X).
² ¾ v ½ means that the complete assignment ¾ is compatible with the partial
assignment ½.In other words,for X 2 dom(½) we have ¾(X) = ½(X).
We can now write equation (1) as follows where T
i
is the conditional prob-
ability table for the ith variable and T
i
(¾) is P(x
i
j parents of x
i
) where the
variable values are determined by ¾.
P(¾) = ¦
i
T
i
(¾) (2)
For a partial assignment ½ we have the following.
P(½) =
X
¾v½
¦
i
T
i
(¾) (3)
Note that T
i
(¾) depends on only some of the variables assigned by ¾.For
example suppose we have seven variables X
1
,:::,X
7
arranged in an\inverted
tree"where variable X
1
has parents X
2
and X
3
;variable X
2
has parents X
4
and X
5
;and variable X
3
has parents X
6
and X
7
.Now suppose that dom(½)
= fX
1
;X
2
g,i.e.,½ assigns a value only to X
1
and X
2
.Then we can write
equation (3) as follows.
3
P(x
1
;x
2
) =
X
x
3
;x
4
;x
5
;x
6
;x
7
[T
2
(x
2
;x
4
;x
5
)T
4
(x
4
)T
5
(x
5
)]
[T
1
(x
1
;x
2
;x
3
);T
3
(x
3
;x
6
;x
7
)T
6
(x
6
)T
7
(x
7
)]
=
h
P
x
4
;x
5
T
2
(x
2
;x
4
;x
5
)T
4
(x
4
)T
5
(x
5
)
i
h
P
x
3
;x
6
;x
7
T
1
(x
1
;x
2
;x
3
)T
3
(x
3
;x
6
;x
7
)T
6
(x
6
)T
7
(x
7
)
i
=
~
P(hx
2
i;hT
2
;T
3
;T
4
i)
~
P(hx
1
;x
2
i;hT
1
;T
3
;T
6
;T
7
i)
where
~
P(½;T ) =
X
¾v½
¦
T2T
T(¾)
where ¾ only assigns to variables in T
So we can summarize this as follows.
P(x
1
;x
2
) =
~
P(hx
1
;x
2
i;hT
1
;T
2
;T
3
;T
4
;T
5
;T
6
;T
7
i)
=
~
P(hx
2
i;hT
2
;T
3
;T
4
i)
~
P(hx
1
;x
2
i;hT
1
;T
3
;T
6
;T
7
i)
To compute probabilities of the form P(½) we can compute the quantities
~
P(½;T ) where these quantities often factor.Recursive conditioning is de¯ned
by the following equation where Y 62 dom(½).
~
P(½;T ) =
X
y2D(Y )
~
P(½
1
[Y:= y];T
1
) ¢ ¢ ¢
~
P(½
k
[Y:= y];T
k
)
For i 6= j we must have that no variable Z 62 dom(½) [fY g appears in both
T
i
and T
j
.Also,we require that ½
i
[Y = y] is the restriction of ½[Y = y] to
the variables occurring T
i
.See the above example.To make this algorithm
e±cient the computations of values for expressions of the form
~
P(½;T ) must be
memoized,i.e.,stored in a table so that values can be reused if they are needed
again.The choice of the variable Y 62 dom(½) is important for the e±ciency of
the algorithm.
4
4 Markov Random Fields
A Bayesian network is a triple hV;Gi where V is a set of random variables X
1
,
:::,X
n
and G is a set of functions such that for any total assignment ¾ to the
variables in V,and for ¡ 2 G,we have that ¡(¾) is a non-negative real number.
(Other authors require ¡(¾) > 0 but we will not require that here).
A Markov random¯eld determines a probability distribution on assignments
de¯ned by the following equation.
P(¾) =
1
Z
¦
¡2G
¡(¾) (4)
Z =
X
¾
¦
¡2G
¡(¾) (5)
It is interesting to compare equation (4) to equation (2).One can see that a
Bayesian network is a special case of a Markov random ¯eld in which G is a set
of conditional probability tables and Z = 1.In general for a Markov random
¯eld,and for ¡ 2 G,one can identify a set of variables on which ¡ depends.
More formally,¡ depends on a variable X if there exists an assighnment ¾ and
a value x 2 D(X) such that ¡(¾) 6= ¡[¾[X = x]).We are typically interested in
cases where for each ¡ 2 G we have that ¡ depends on only a small number of
variables,perhaps two or three.
In many applications,such as depth maps for vision,Markov random ¯elds
are more natural than Bayesian networks.We are now interested in inference
for Markov random ¯elds where we have the following.
P(x
2
;x
5
j x
6
;x
7
) =
P(x
2
;x
5
;x
6
;x
7
)
P(x
6
;x
7
)
=
P
¾vhx
2
;x
5
;x
6
;x
7
i
¦
¡2G
¡(¾)
P
¾vhx
6
;x
7
i
¦
¡2G
¡(¾)
=
Z(hx
2
;x
5
;x
6
;x
7
i)
Z(hx
6
;x
7
i)
Z(½) =
X
¾v½
¦
¡2G
¡(¾)
So for inference it now su±ces to compute Z(½) as de¯ned above.Since
5
inference for Markov random ¯elds generalizes inference for Bayesian networks,
the inference problem for Markov random ¯elds is also#P hard.But we can
again use recursive conditioning which will work well when certain structure
exists.We de¯ne Z(½;G) as follows.
Z(½;G) =
X
¾v½
¦
¡2G
¡(¾)
where ¾ only assigns to variables in G
Recursive conditioning for Markov random ¯elds is de¯ned by the following
equation where Y 62 dom(½).
Z(½;G) =
X
y2D(Y )
Z(½
1
[Y:= y];G
1
) ¢ ¢ ¢ Z(½
k
[Y:= y];G
k
)
As with Bayesian networks,for i 6= j we must have that no variable Z 62
dom(½) [ fY g appears in both G
i
and G
j
.Also,we require that ½
i
[Y = y]
is the restriction of ½[Y = y] to the variables occurring G
i
.This algorithm
is identicle to that used for Bayesian networks and the same comments about
memomoization and the choice of Y 2 dom(½) apply.
6