Mean Field Inference in Dependency
Networks: An Empirical Study
Daniel Lowd and
Arash
Shamaei
University of Oregon
Learning and Inference in
Graphical Models
We want to learn a probability distribution from data and
use it to answer queries.
Applications:
medical diagnosis, fault diagnosis, web usage
analysis, bioinformatics, collaborative filtering, etc.
A
B
C
Answers!
Data
Model
Learning
Inference
2
One

Slide Summary
1.
In dependency networks, mean field inference is faster than
Gibbs sampling, with similar accuracy.
2.
Dependency networks are competitive with Bayesian
networks.
A
B
C
Answers!
Data
Model
Learning
Inference
3
Outline
•
Graphical models:
Dependency networks vs. others
–
Representation
–
Learning
–
Inference
•
Mean field inference in dependency networks
•
Experiments
4
Dependency Networks
Represents a probability distribution over {X
1
, …,
X
n
} as a
set of conditional probability distributions.
Example:
X
1
X
2
X
3
[Heckerman et al., 2000]
5
Comparison of Graphical Models
Bayesian
Network
Markov
Network
Dependency
Network
Allow cycles?
N
Y
Y
Easy to learn?
Y
N
Y
Consistent
distribution?
Y
Y
N
Inference
algorithms
…lots…
…lots…
䝩扢猬
䵆M⡮(眡w
6
Learning Dependency Networks
For each variable X
i
, learn conditional
distribution,
B=?
<0.2, 0.8>
<0.5, 0.5>
<0.7, 0.3>
C=?
C=
?
<0.7, 0.3>
<0.4, 0.6>
[Heckerman et al., 2000]
7
Approximate Inference Methods
•
Gibbs sampling: Slow but effective
•
Mean field: Fast and usually accurate
•
Belief propagation: Fast and usually accurate
A
B
C
Answers!
Model
8
Gibbs Sampling
Resample each variable in turn, given its neighbors:
Use set of samples to answer queries.
e.g.,
Converges to true distribution, given enough samples
(assuming positive distribution).
Previously, the only method used to compute probabilities in
DNs
.
9
Mean Field
Approximate P with simpler distribution Q:
To find best Q, optimize reverse K

L divergence:
Mean field updates converge to local optimum:
Works for
DNs
! Never before tested!
10
Mean Field in Dependency Networks
1.
Initialize each
Q(X
i
)
to a uniform distribution.
2.
Update each
Q(X
i
)
in turn:
3.
Stop when
marginals
Q(X
i
)
converge.
If consistent, this is guaranteed to converge.
If inconsistent, this always seems to converge in practice.
11
Empirical Questions
Q1.
In
DNs
, how does MF compare to Gibbs
sampling in speed and accuracy?
Q2.
How do
DNs
compare to
BNs
in inference
speed and accuracy?
12
Experiments
•
Learned
DNs
and
BNs
on 12 datasets
•
Generated queries from test data
–
Varied evidence variables from 10% to 90%
–
Score using average CMLL per variable
(conditional marginal log

likelihood):
13
Results: Accuracy in
DNs
14
0
0.1
0.2
0.3
0.4
0.5
0.6
Negative CMLL
DN.MF
DN.Gibbs
Negative CMLL
Results: Timing in
DNs
(log scale)
15
0.01
0.1
1
10
100
Inference Time (s)
DN.MF
DN.Gibbs
MF vs. Gibbs in
DNs
,
run for equal time
Evidence
#
of MF wins
% wins
10%
9
75%
20%
10
83%
30%
10
83%
40%
9
75%
50%
10
83%
60%
10
83%
70%
11
92%
80%
11
92%
90%
12
100%
Average
10.2
85%
In
DNs
, MF usually more accurate, given equal time.
16
Results: Accuracy
17
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Negative CMLL
DN.MF
DN.Gibbs
BN.MF
BN.Gibbs
BN.BP
Gibbs: DN vs. BN
Evidence
DN
wins
Percent wins
10%
3
25%
20%
1
8%
30%
1
8%
40%
3
25%
50%
5
42%
60%
7
58%
70%
10
83%
80%
10
83%
90%
11
92%
Average
5.7
47%
With more evidence,
DNs
are more accurate.
18
Experimental Results
Q1.
In
DNs
, how does MF compare to Gibbs
sampling in speed and accuracy?
A1.
MF is consistently faster with similar
accuracy, or more accurate with similar
speed.
Q2.
How do
DNs
compare to
BNs
in inference
speed and accuracy?
A2.
DNs
are competitive with
BNs
–
better with
more evidence, worse with less evidence.
19
Conclusion
•
MF inference in
DNs
is fast and accurate,
especially with more evidence.
•
Future work:
–
Relational dependency networks
(Neville & Jensen, 2007)
–
More powerful approximations
Source code available:
http://libra.cs.uoregon.edu/
20
Results: Timing (log scale)
21
0.01
0.1
1
10
100
1000
Inference Time (s)
DN.MF
DN.Gibbs
BN.MF
BN.Gibbs
BN.BP
Learned Models
1.
Learning time is comparable.
2.
DNs
usually have higher pseudo

likelihood (PLL)
3.
DNs
sometimes have higher log

likelihood (LL)
22
Comments 0
Log in to post a comment