Mean Field Inference in Dependency Networks: An Empirical Study

AI and Robotics

Nov 7, 2013 (4 years and 6 months ago)

119 views

Mean Field Inference in Dependency
Networks: An Empirical Study

Daniel Lowd and
Arash

Shamaei

University of Oregon

Learning and Inference in

Graphical Models

We want to learn a probability distribution from data and
use it to answer queries.

Applications:

medical diagnosis, fault diagnosis, web usage
analysis, bioinformatics, collaborative filtering, etc.

A

B

C

Data

Model

Learning

Inference

2

One
-
Slide Summary

1.
In dependency networks, mean field inference is faster than
Gibbs sampling, with similar accuracy.

2.
Dependency networks are competitive with Bayesian
networks.

A

B

C

Data

Model

Learning

Inference

3

Outline

Graphical models:

Dependency networks vs. others

Representation

Learning

Inference

Mean field inference in dependency networks

Experiments

4

Dependency Networks

Represents a probability distribution over {X
1
, …,
X
n
} as a

set of conditional probability distributions.

Example:

X
1

X
2

X
3

[Heckerman et al., 2000]

5

Comparison of Graphical Models

Bayesian

Network

Markov
Network

Dependency
Network

Allow cycles?

N

Y

Y

Easy to learn?

Y

N

Y

Consistent

distribution?

Y

Y

N

Inference

algorithms

…lots…

…lots…

䝩扢猬

䵆M⡮(眡w

6

Learning Dependency Networks

For each variable X
i
, learn conditional
distribution,

B=?

<0.2, 0.8>

<0.5, 0.5>

<0.7, 0.3>

C=?

C=
?

<0.7, 0.3>

<0.4, 0.6>

[Heckerman et al., 2000]

7

Approximate Inference Methods

Gibbs sampling: Slow but effective

Mean field: Fast and usually accurate

Belief propagation: Fast and usually accurate

A

B

C

Model

8

Gibbs Sampling

Resample each variable in turn, given its neighbors:

Use set of samples to answer queries.

e.g.,

Converges to true distribution, given enough samples
(assuming positive distribution).

Previously, the only method used to compute probabilities in
DNs
.

9

Mean Field

Approximate P with simpler distribution Q:

To find best Q, optimize reverse K
-
L divergence:

Mean field updates converge to local optimum:

Works for
DNs
! Never before tested!

10

Mean Field in Dependency Networks

1.
Initialize each
Q(X
i
)

to a uniform distribution.

2.
Update each
Q(X
i
)

in turn:

3.
Stop when
marginals

Q(X
i
)

converge.

If consistent, this is guaranteed to converge.

If inconsistent, this always seems to converge in practice.

11

Empirical Questions

Q1.

In
DNs
, how does MF compare to Gibbs
sampling in speed and accuracy?

Q2.

How do
DNs

compare to
BNs

in inference
speed and accuracy?

12

Experiments

Learned
DNs

and
BNs

on 12 datasets

Generated queries from test data

Varied evidence variables from 10% to 90%

Score using average CMLL per variable

(conditional marginal log
-
likelihood):

13

Results: Accuracy in
DNs

14

0
0.1
0.2
0.3
0.4
0.5
0.6
Negative CMLL

DN.MF
DN.Gibbs
Negative CMLL

Results: Timing in
DNs

(log scale)

15

0.01
0.1
1
10
100
Inference Time (s)

DN.MF
DN.Gibbs
MF vs. Gibbs in
DNs
,

run for equal time

Evidence

#

of MF wins

% wins

10%

9

75%

20%

10

83%

30%

10

83%

40%

9

75%

50%

10

83%

60%

10

83%

70%

11

92%

80%

11

92%

90%

12

100%

Average

10.2

85%

In
DNs
, MF usually more accurate, given equal time.

16

Results: Accuracy

17

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Negative CMLL

DN.MF
DN.Gibbs
BN.MF
BN.Gibbs
BN.BP
Gibbs: DN vs. BN

Evidence

DN

wins

Percent wins

10%

3

25%

20%

1

8%

30%

1

8%

40%

3

25%

50%

5

42%

60%

7

58%

70%

10

83%

80%

10

83%

90%

11

92%

Average

5.7

47%

With more evidence,
DNs

are more accurate.

18

Experimental Results

Q1.

In
DNs
, how does MF compare to Gibbs
sampling in speed and accuracy?

A1.

MF is consistently faster with similar
accuracy, or more accurate with similar
speed.

Q2.

How do
DNs

compare to
BNs

in inference
speed and accuracy?

A2.

DNs

are competitive with
BNs

better with
more evidence, worse with less evidence.

19

Conclusion

MF inference in
DNs

is fast and accurate,
especially with more evidence.

Future work:

Relational dependency networks

(Neville & Jensen, 2007)

More powerful approximations

Source code available:
http://libra.cs.uoregon.edu/

20

Results: Timing (log scale)

21

0.01
0.1
1
10
100
1000
Inference Time (s)

DN.MF
DN.Gibbs
BN.MF
BN.Gibbs
BN.BP
Learned Models

1.
Learning time is comparable.

2.
DNs

usually have higher pseudo
-
likelihood (PLL)

3.
DNs

sometimes have higher log
-
likelihood (LL)

22