Bayesian Networks and Markov Random Fields

1 Bayesian Networks

As before,we will use capital letters for random variables and lower case letters

for values of those variables.A Bayesian network is a triple hV;G;Pi where V is

a set of randomvariables X

1

,:::,X

n

,Gis a directed acyclic graph (DAG) whose

nodes are the variables in V,and P is a set of conditional probability tables

as described below.The conditional probability tables determine a probability

distribution over the values of the variables.If there is a directed edge in G

from X

j

to X

i

then we will say that X

j

is a parent of X

i

and X

i

is a child of

X

j

.To determine values for the variables one ¯rst selects values for variables

that have no parents and then repeatedly picks a value for any node all of whose

parents already have values.When we pick a value for a variable we look only

at the values of the parents of that variable.We will write P(x

i

j parents of x

i

)

to abbreviate P(x j x

i

1

;:::x

i

k

) where x

i

1

;:::;x

i

k

are the parents of x.For

example,if x

7

has parents x

2

and x

4

then P(x

7

j parents of x

7

) abbreviates

P(x

7

j x

2

;x

4

).Formally,the probability distribution on the variables of a

Bayesian network is determined by the following equation.

P(x

1

;:::;x

n

) = ¦

n

i=1

P(x

i

j parents of x

i

) (1)

The conditional probabilities of the form P(x

i

j parents of x

i

) are called

conditional probability tables (CPTs).Suppose that each of the variables x

i

has d possible values (this is not required in general).In this case if a variable x

has k parents then P(x j parents of x) has d

k+1

values (with (d¡1)d

k

degrees of

freedom).These d

k+1

values can be stored in a table with k +1 indeces.Hence

the term\table".Note that the number of indeces of the CPTs (conditional

probability tables) is di®erent for the di®erent variables.

Note that an HMM is a Bayesian network with a variable for each hidden

state and each observable token.

Bayesian networks are often used in medical diagnosis where variables rep-

resent the presence or absence of certain diseases or the presence o absence

of certain measurable quantities such as blood sugar or presence of a certain

protein in the blood.

In a Bayesian network the edges of the directed graph are often interpreted

as\causation"with the parent causally in°uencing the child and parents getting

assigned values temporally before children.

We are interested in\Bayesian inference"which means,intuitively,inferring

1

causes by observing their e®ects using Bayes'rule.In an HMMfor example,we

want to infer the hidden states from the observations that they cause.

In general we can formulate the inference problem as the problem of deter-

mining the probability of an unknown variable (a hidden cause) from observed

values of other variables.In general we can consider the variables in any order.

P(x

5

;x

7

j x

3

;x

2

) =

P(x

5

;x

7

;x

2

;x

3

)

P(x

2

;x

3

)

So in general,for inference it su±ces to be able to compute probabilities of

the form P(x

i

1

;:::;x

i

k

).We give an algorithm for doing this in section 3.

2 Inference in Bayesian Network is#P hard

A Boolean variable (also called Bernoulli variable) is a variable that has only

the two possible values of 0 or 1.A disjunctive clause is a disjunction of literals

where each literal is either a Boolean variable or the negation of a Boolean

variable.For example we have that (X

5

_:X

2

3 _ X

3

7) is a clause with three

literals.A 3SAT problem is a set of clauses with three literals in each clause.It

is hard (in fact#P hard) to take a 3SAT problem and determine the number

of complete assignments of values to variables that satisfy the clauses,i.e.,that

make every clause true.

Take X

1

,:::,X

n

be independent Boolean (Bernoulli) variables with P(X

i

=

1) = 1=2.Let § be a set of clauses over these variables where each clause has

only three literals.Let C

j

be a random variable which is 1 if the jth clause

is satis¯ed.Since we can compute c

j

from x

1

,:::,x

n

using a (deterministic)

conditional probability table having only three parents.Let A

j

be a Boolean

variable that is true if all of C

1

,:::,C

j

are true.a

1

can be computed with a

CPT from c

1

and for j > 1 we have that a

j

can be computed using a CPT from

a

j¡1

and c

j

.

Now we have that P(A

k

= 1) is proportional to the number of truth assign-

ments that satisfying all the clauses.This implies that computing the proba-

bility of a partial assignment in a Bayesian network is#P hard.It is widely

believed that there are no polynomial time algorithms for#P hard problems.

2

3 Recursive Conditioning

Although inference in Bayesian networks is hard in general,there exist algo-

rithms that work well when the Bayesian network has special structure.

² D(X) is the set of values that variable X can have.

² ½ ranges over partial assignments of values to variables |½ assigns values

to some of the variables in V.

² dom(½) is the set of variables that are assigned values by ½.For X 2

dom(½) we have that ½(X) 2 D(X).

² For X 62 dom(½),and x 2 D(X) we let ½[X:= x] be the extension of ½

that assigns X the value x.

² ¾ ranges over total assignments of values to the variables in V.For X 2 V

we have ¾(X) 2 D(X).

² ¾ v ½ means that the complete assignment ¾ is compatible with the partial

assignment ½.In other words,for X 2 dom(½) we have ¾(X) = ½(X).

We can now write equation (1) as follows where T

i

is the conditional prob-

ability table for the ith variable and T

i

(¾) is P(x

i

j parents of x

i

) where the

variable values are determined by ¾.

P(¾) = ¦

i

T

i

(¾) (2)

For a partial assignment ½ we have the following.

P(½) =

X

¾v½

¦

i

T

i

(¾) (3)

Note that T

i

(¾) depends on only some of the variables assigned by ¾.For

example suppose we have seven variables X

1

,:::,X

7

arranged in an\inverted

tree"where variable X

1

has parents X

2

and X

3

;variable X

2

has parents X

4

and X

5

;and variable X

3

has parents X

6

and X

7

.Now suppose that dom(½)

= fX

1

;X

2

g,i.e.,½ assigns a value only to X

1

and X

2

.Then we can write

equation (3) as follows.

3

P(x

1

;x

2

) =

X

x

3

;x

4

;x

5

;x

6

;x

7

[T

2

(x

2

;x

4

;x

5

)T

4

(x

4

)T

5

(x

5

)]

[T

1

(x

1

;x

2

;x

3

);T

3

(x

3

;x

6

;x

7

)T

6

(x

6

)T

7

(x

7

)]

=

h

P

x

4

;x

5

T

2

(x

2

;x

4

;x

5

)T

4

(x

4

)T

5

(x

5

)

i

h

P

x

3

;x

6

;x

7

T

1

(x

1

;x

2

;x

3

)T

3

(x

3

;x

6

;x

7

)T

6

(x

6

)T

7

(x

7

)

i

=

~

P(hx

2

i;hT

2

;T

3

;T

4

i)

~

P(hx

1

;x

2

i;hT

1

;T

3

;T

6

;T

7

i)

where

~

P(½;T ) =

X

¾v½

¦

T2T

T(¾)

where ¾ only assigns to variables in T

So we can summarize this as follows.

P(x

1

;x

2

) =

~

P(hx

1

;x

2

i;hT

1

;T

2

;T

3

;T

4

;T

5

;T

6

;T

7

i)

=

~

P(hx

2

i;hT

2

;T

3

;T

4

i)

~

P(hx

1

;x

2

i;hT

1

;T

3

;T

6

;T

7

i)

To compute probabilities of the form P(½) we can compute the quantities

~

P(½;T ) where these quantities often factor.Recursive conditioning is de¯ned

by the following equation where Y 62 dom(½).

~

P(½;T ) =

X

y2D(Y )

~

P(½

1

[Y:= y];T

1

) ¢ ¢ ¢

~

P(½

k

[Y:= y];T

k

)

For i 6= j we must have that no variable Z 62 dom(½) [fY g appears in both

T

i

and T

j

.Also,we require that ½

i

[Y = y] is the restriction of ½[Y = y] to

the variables occurring T

i

.See the above example.To make this algorithm

e±cient the computations of values for expressions of the form

~

P(½;T ) must be

memoized,i.e.,stored in a table so that values can be reused if they are needed

again.The choice of the variable Y 62 dom(½) is important for the e±ciency of

the algorithm.

4

4 Markov Random Fields

A Bayesian network is a triple hV;Gi where V is a set of random variables X

1

,

:::,X

n

and G is a set of functions such that for any total assignment ¾ to the

variables in V,and for ¡ 2 G,we have that ¡(¾) is a non-negative real number.

(Other authors require ¡(¾) > 0 but we will not require that here).

A Markov random¯eld determines a probability distribution on assignments

de¯ned by the following equation.

P(¾) =

1

Z

¦

¡2G

¡(¾) (4)

Z =

X

¾

¦

¡2G

¡(¾) (5)

It is interesting to compare equation (4) to equation (2).One can see that a

Bayesian network is a special case of a Markov random ¯eld in which G is a set

of conditional probability tables and Z = 1.In general for a Markov random

¯eld,and for ¡ 2 G,one can identify a set of variables on which ¡ depends.

More formally,¡ depends on a variable X if there exists an assighnment ¾ and

a value x 2 D(X) such that ¡(¾) 6= ¡[¾[X = x]).We are typically interested in

cases where for each ¡ 2 G we have that ¡ depends on only a small number of

variables,perhaps two or three.

In many applications,such as depth maps for vision,Markov random ¯elds

are more natural than Bayesian networks.We are now interested in inference

for Markov random ¯elds where we have the following.

P(x

2

;x

5

j x

6

;x

7

) =

P(x

2

;x

5

;x

6

;x

7

)

P(x

6

;x

7

)

=

P

¾vhx

2

;x

5

;x

6

;x

7

i

¦

¡2G

¡(¾)

P

¾vhx

6

;x

7

i

¦

¡2G

¡(¾)

=

Z(hx

2

;x

5

;x

6

;x

7

i)

Z(hx

6

;x

7

i)

Z(½) =

X

¾v½

¦

¡2G

¡(¾)

So for inference it now su±ces to compute Z(½) as de¯ned above.Since

5

inference for Markov random ¯elds generalizes inference for Bayesian networks,

the inference problem for Markov random ¯elds is also#P hard.But we can

again use recursive conditioning which will work well when certain structure

exists.We de¯ne Z(½;G) as follows.

Z(½;G) =

X

¾v½

¦

¡2G

¡(¾)

where ¾ only assigns to variables in G

Recursive conditioning for Markov random ¯elds is de¯ned by the following

equation where Y 62 dom(½).

Z(½;G) =

X

y2D(Y )

Z(½

1

[Y:= y];G

1

) ¢ ¢ ¢ Z(½

k

[Y:= y];G

k

)

As with Bayesian networks,for i 6= j we must have that no variable Z 62

dom(½) [ fY g appears in both G

i

and G

j

.Also,we require that ½

i

[Y = y]

is the restriction of ½[Y = y] to the variables occurring G

i

.This algorithm

is identicle to that used for Bayesian networks and the same comments about

memomoization and the choice of Y 2 dom(½) apply.

6

## Comments 0

Log in to post a comment