CS B553 Homework 7: Parameter and Structure Learning in Chow-Liu Tree Networks

brewerobstructionAI and Robotics

Nov 7, 2013 (3 years and 7 months ago)

74 views

CS B553 Homework
7
:
Parameter and Structure Learning in
Chow
-
Liu Tree

Networks

Due date:

4/10
/2012

In this assignment you will be looking at a very simple type of Bayesian network model called a Chow
-
Liu
tree
.


Chow
-
Liu trees
are useful because they

provide a class network structures that are easy to learn.
(In fact
, they
predate Bayesian networks.
)

In this model, the joint distribution over
X
1
,…,
X
n

is
decomposed into first
-
order conditionals as follows:


(







)



(


)





(







)




where
R

is the subset of
root
nodes
, and
for non
-
root nodes

p
a
i

gives
the index of
i
’s
only parent. So, we
can encode the
graph

structure

through the encoding
G
=(
pa
1
,…,
pa
n
)
with
pa
i

{0,1,…,
n
}
,

where we take
the convention that
any
node with
pa
i
=0 is taken to be
a
root.


(Note that we’re allowing the
encoding
to
produce
forests rather than constraining
the network to be a tree)

We’ll assume all variables are binary. You may wish to use the notation
x
i
0

to indicate

the event that
X
i
=0 and
x
i
1

to
indicate

the event that
X
i
=1.

1.

List all of the parameters in the CPTs of a Chow
-
Liu tree network with structure
G
=(
pa
1
,…,
pa
n
),
with node
X
r

being the root. Gather these into a single parameter vector

.


2.

Given a dataset D=(

x
[1],…,
x
[M])

of fully observed samples
, write down the expression
l(D;


,
G
)
giving the log
-
likelihood of D given


under the structure model
G
.

Use the textbook’s notation
M[‘event’] to indicate the number of times ‘event’ occurs in the dataset D.

3.

Give a formula for
the maximum
-
likelihood estimate

*

given the data D. What is the log
-
likelihood l(D;

*
,

G
)?

4.

Consider the model
G
0
=(0,…,0), i.e., the
entirely disconnected
network
. Now consider adding a
new edge
j
-
>
i

by setting
pa
i

=

j

to derive the model
G

=(0,…,0,j,0,…,0) (
j

is in the
i
’th index). What
is the change in log
-
likelihood of the maximum
-
likelihood parameter estimates when changing
from
G
0

to
G
? That is, compute l(D;


,
G
)



l(D;


0
,
G
0
)

where


is the MLE for model
G

and

0

is
the MLE for mode
l
G
0
.

5.

Now consider the graph
G
0

with
X
i

a root node, and consider adding the edge
j
-
>
i

to achieve a
graph
G

that is

also a forest. What expression do you get for l(D;


,
G
)



l(D;


0
,
G
0
)
, with


the
MLE for
G

and

0

the MLE for
G
0
?

6.

Consider an algorithm that starts with a completely disconnected network and greedily starts to
add edges
j
-
>
i
, ensuring that each new edge

do
es

not induce
a cycle

in the graph.
Edges are
added in order of largest to smallest improvement in likelihood.
D
emonstrate that
it
produces a
tree rather than a forest.
Give its running time in big
-
O notation.
Furthermore,
prove or
argue
that it produces the Chow
-
Liu tree with maximum likelihood over all trees (hint: consider how
this relates to Prim’s algorithm f
or minimum spanning trees).