CS B553 Homework
7
:
Parameter and Structure Learning in
Chow

Liu Tree
Networks
Due date:
4/10
/2012
In this assignment you will be looking at a very simple type of Bayesian network model called a Chow

Liu
tree
.
Chow

Liu trees
are useful because they
provide a class network structures that are easy to learn.
(In fact
, they
predate Bayesian networks.
)
In this model, the joint distribution over
X
1
,…,
X
n
is
decomposed into first

order conditionals as follows:
(
)
∏
(
)
∏
(
)
where
R
is the subset of
root
nodes
, and
for non

root nodes
p
a
i
gives
the index of
i
’s
only parent. So, we
can encode the
graph
structure
through the encoding
G
=(
pa
1
,…,
pa
n
)
with
pa
i
{0,1,…,
n
}
,
where we take
the convention that
any
node with
pa
i
=0 is taken to be
a
root.
(Note that we’re allowing the
encoding
to
produce
forests rather than constraining
the network to be a tree)
We’ll assume all variables are binary. You may wish to use the notation
x
i
0
to indicate
the event that
X
i
=0 and
x
i
1
to
indicate
the event that
X
i
=1.
1.
List all of the parameters in the CPTs of a Chow

Liu tree network with structure
G
=(
pa
1
,…,
pa
n
),
with node
X
r
being the root. Gather these into a single parameter vector
.
2.
Given a dataset D=(
x
[1],…,
x
[M])
of fully observed samples
, write down the expression
l(D;
,
G
)
giving the log

likelihood of D given
under the structure model
G
.
Use the textbook’s notation
M[‘event’] to indicate the number of times ‘event’ occurs in the dataset D.
3.
Give a formula for
the maximum

likelihood estimate
*
given the data D. What is the log

likelihood l(D;
*
,
G
)?
4.
Consider the model
G
0
=(0,…,0), i.e., the
entirely disconnected
network
. Now consider adding a
new edge
j

>
i
by setting
pa
i
=
j
to derive the model
G
=(0,…,0,j,0,…,0) (
j
is in the
i
’th index). What
is the change in log

likelihood of the maximum

likelihood parameter estimates when changing
from
G
0
to
G
? That is, compute l(D;
,
G
)
–
l(D;
0
,
G
0
)
where
is the MLE for model
G
and
0
is
the MLE for mode
l
G
0
.
5.
Now consider the graph
G
0
with
X
i
a root node, and consider adding the edge
j

>
i
to achieve a
graph
G
that is
also a forest. What expression do you get for l(D;
,
G
)
–
l(D;
0
,
G
0
)
, with
the
MLE for
G
and
0
the MLE for
G
0
?
6.
Consider an algorithm that starts with a completely disconnected network and greedily starts to
add edges
j

>
i
, ensuring that each new edge
do
es
not induce
a cycle
in the graph.
Edges are
added in order of largest to smallest improvement in likelihood.
D
emonstrate that
it
produces a
tree rather than a forest.
Give its running time in big

O notation.
Furthermore,
prove or
argue
that it produces the Chow

Liu tree with maximum likelihood over all trees (hint: consider how
this relates to Prim’s algorithm f
or minimum spanning trees).
Comments 0
Log in to post a comment