Lifted Message Passing*

hartebeestgrassAI and Robotics

Nov 7, 2013 (3 years and 5 months ago)

48 views

A

The Center
for

Language and Speech Processing @ JHU, Baltimore, USA Sep. 7, 2010

Lifted

Message
Passing
*

Babak
Ahmadi

Sriraam
Natarajan

Fabian
Hadiji

Scott
Sanner

Youssef El
Massaoudi

E

(*)
Many

thanks

to

many

people

for

making

their

slides

publically

available

2

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Rorschach Test

3

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Etzioni’s

Rorschach
Test for
Computer Scientists

4

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Moore’s Law?

5

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Storage Capacity?

6

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Number of
Facebook

Users?

7

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Number of
Scientific Publications?

8

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Number of Web Pages?

9

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Number of
Actions?

10

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Computing 2020: Science in an Exponential World

How to deal with millions of images ?

How to deal with millions of inter
-
related
research papers ?

How to accumulate general knowledge
automatically from the Web ?

How to deal with billions of shared users’
perceptions stored at massive scale ?

How do realize the vision of social search?

“The amount of scientific data is doubling every year”

[
Szalay,Gray
;
Nature

440, 413
-
414 (23 March 2006) ]

11

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010



Real
world

is

structured

in
terms

of

objects

and
relations


Relational knowledge
can reveal
additional correlations
between
variables of interest .
Abstraction

allows one to compactly model
general knowledge
and to move to
complex inference

Artificial Intelligence in an Exponential World

Machine

Learning =
Data

+

Model

AI =
Structured

(
Data
+
Model
+

Reasoning
)



Most
effort

has gone into the modeling part



How much can the data itself help us to solve a problem?

[Fergus et al. 30(11) 2008; Halevy et al., IEEE Intelligent Systems, 24 2009]

12

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

http://www.cs.washington.edu/research/textrunner/

Object

Object

Relation

Uncertainty

“Programs will
consume, combine, and correlate
everything in the universe of structured information
and
help users reason over it
.”
[S.
Parastatidis

et al.,
Communications of the ACM Vol. 52(12):33
-
37 ]

[
Etzioni

et al. ACL08]

13

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

So,
the

Real World
is

Complex

and

Uncertain


Information
overload


Incomplete

information


Contradictory

information


Many sources and modalities


Rapid
change

How can computer systems handle
these ?

14

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

(First
-
order)
Logic

handles

Complexity

atomic

propositional

first
-
order/relational



Many

types

of

entities



Relations
between

them



Arbitrary

knowledge

19
th

C

5
th

C B.C.


Explicit
enumeration

daugther
-
of
(
cecily,john
)

daugther
-
of
(
lily,tom
)



E.g., rules of chess (which is a tiny problem):

1 page in first
-
order logic,

~100000 pages in propositional logic,

~100000000000000000000000000000000000000 pages as atomic
-
state model

Logic

true/false

15

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Probability

handles

Uncertainty

Logic

true/false

Probability

atomic

propositional

first
-
order/relational



Sensor
noise



Human
error



Inconsistencies



Unpredictability

5
th

C B.C.

19
th

C

17
th

C

20
th

C



Many

types

of

entities



Relations
between

them



Arbitrary

knowledge


Explicit
enumeration

16

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010


Will Traditional AI
Scale

?

Logic

true/false

Probability

atomic

propositional

first
-
order/relational



Sensor
noise



Human
error



Inconsistencies



Unpredictability

5
th

C B.C.

19
th

C

17
th

C

20
th

C



Many

types

of

entities



Relations
between

them



Arbitrary

knowledge


Explicit
enumeration

“Scaling up the environment will
inevitably

overtax the
resources of the current AI

architecture.”

17

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Statistical Relational AI (
StarAI
*)

… unifies logical and statistical AI,

… solid formal foundations,

… is of interest to many communities.

Let‘s

deal

with

uncertainty
,
objects
,
and

relations

jointly




Robotics

CV

Search

Planning

SAT

Probability

Statistics

Logic

Graphs

Trees



Learning


Natural domain modeling:
objects, properties,
relations


Compact, natural
models


Properties of entities can
depend on properties of
related
entities


Generalization over a
variety of situations


(*)First
StarAI

workshop

at

AAAI10;co
-
chaired
with

S. Russell, L. Kaelbling,
A.Halevy
, S. Natarajan,
and

L.
Milhalkova

18

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

StarAI

/ SRL Key Dimensions


Logical language

First
-
order logic, Horn clauses, frame systems


Probabilistic language

Bayesian networks, Markov networks, PCFGs


Type of learning

-
Generative / Discriminative

-
Structure / Parameters

-
Knowledge
-
rich / Knowledge
-
poor


Type of inference

-
MAP / Marginal

-
Full grounding / Partial grounding / Lifted

19

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Markov Logic Networks



)
,
(
_
)
,
(
)
,
(
,
)
(
)
(
)
,
(
_
,
)
(
)
(
_
)
(
_
)
(
)
,
(
y
x
author
co
p
y
author
p
x
author
p
y
x
y
smart
x
smart
y
x
author
co
y
x
p
accepted
p
quality
high
x
p
quality
high
x
smart
p
x
author
x













2
.
1
1
.
1
5
.
1
Suppose we have constants:
alice
,
bob

and
p1

smart(bob)

smart(
alice
)

high_quality
(p1)

author(p1,alice)

author(p1,bob)

accepted(p1)

co_author
(
bob,alice
)

co_author
(
alice,bob
)

co_author
(
alice,alice
)

co_author
(
bob,bob
)

[Richardson, Domingos MLJ
62(1
-
2): 107
-
136, 2006]

20

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

ILP= Machine
Learning
+

Logic
Programming

[
Muggleton
, De Raedt JLP96]


Examples

E

pos
(
mutagenic
(m
1
))

neg
(
mutagenic
(m
2
))

pos
(
mutagenic
(m
3
))

...

c

c

c

c

c

c

n

o

Background
Knowledge

B

molecule
(m
1
)

atom
(m
1
,a
11
,c)

atom
(m
1
,a
12
,n)

bond
(m
1
,a
11
,a
12
)

charge
(m
1
,a
11
,0.82)

...

molecule
(m
2
)

atom
(m
2
,a
21
,o)

atom
(m
2
,a
22
,n)

bond
(m
2
,a
21
,a
22
)

charge
(m
2
,a
21
,0.82)

...

Find
set

of

general

rules

mutagenic
(X)
:
-

atom
(
X,A,c
),
charge
(X,A,0.82)

mutagenic
(X)
:
-

atom
(
X,A,n
),...

21

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

:
-

true

Coverage

=
0.5
,
0.7


Coverage

=
0.6
,
0.3


Coverage

=
0.4
,
0.6

:
-

atom(
X,A,c
)

:
-

atom(
X,A,n
)

:
-

atom(
X,A,f
)

Coverage

=
0.8

Coverage

=
0.6

:
-

atom(
X,A,c
),bond(A,B)

:
-

atom(
X,A,n
),
charge(A,0.82)

Example

ILP
Algorithm
:
FOIL
[
Quinlan

MLJ 5:239
-
266, 1990]

mutagenic(X) :
-

atom
(
X,A,n
),
charge
(A,0.82)

mutagenic(X) :
-

atom
(
X,A,c
),
bond
(A,B)

0

1









1



Some

objective

function
, e.g.

percentage

of

covered

positive
examples

22

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010


Traverses the hypotheses space a la ILP


Replaces ILP’s 0
-
1 covers relation by a “smooth”,
probabilistic one [0,1]

0

1









1

mutagenic(X) :
-

atom
(
X,A,n
),
charge
(A,0.82)

mutagenic(X) :
-

atom
(
X,A,c
),
bond
(A,B)




=0.882

Probabilistic

ILP
aka

SRL
[De Raedt, K ALT04]

23

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Today
we

can




… learn probabilistic relational models automatically from
millions of inter
-
related objects


… generate optimal plans and learn to act optimally in uncertain
environments involving millions of objects and relations among
them


… perform lifted probabilistic inference avoiding explicit state
enumeration by manipulating first
-
order state representations
directly


… exploit shared factors to speed up message
-
passing
algorithms for relational inference but also for classical
propositional inference such as solving SAT problems

24

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Distributions can naturally be represented as
factor graphs


Each

circle

denotes

a (
random
) variable,
each

box
denotes

a
factor

(potential)







There

is

an
edge

between

a
circle

and a box
if

the

variable
is

in
the

domain
/
scope

of

the

factor

unnormalized

!

25

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Factor Graphs from Graphical Models

26

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Variable Elimination

Sum out non
-
query variables
one by one




popular

start series

attends(p
1
)

attends(p
2
)

attends(
p
n
)


1
(pop,
att
(p
1
))

2
(
att
(p
1
), ser)



att
(p
1
)


(pop, ser)

Time is
linear

in
number of invitees
n

Example: Inviting
n

people to a workshop








Does
not

exploit symmetries
encoded in the structure of the model

27

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Parameterized Factors (
Pacfactors
)



attends(p
1
)

attends(p
2
)

attends(
p
n
)

popular

start series

[
Pfeffer

et al. 1999; Poole 2003; de Salvo Braz et al. 2005]

Parfactors


X.

1
(popular, attends(X))


X.

2
(attends(X), series)

So, let’s
exploit symmetries
revealed by relational model.
This is
called lifted inference!

28

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

First
-
Order Variable Elimination

Sum out all
attends(X)

variables at once


X.

1
(popular, attends(X))


X.

2
(attends(X), series)


X.

(popular, series)


(popular, series)
n




popular

start series

attends(p
1
)

attends(p
2
)

attends(
p
n
)

Time is
constant

in
n

[Poole 2003; de Salvo Braz
et al
. 2005]

29

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Symmetry Within Factors


Values of counting formula are
histograms

counting how
many objects
X

yield each possible value of
attends(X)

-
Only
n
+1 histograms, e.g., [50, 0], [49, 1], …, [0, 50]

-
Factor size now 2


(
n
+1):
linear

in
n



attends(p
1
)

attends(p
2
)

attends(
p
n
)

overflow


(overflow, #
X
[attends(X)])

Size of naïve factor

representation: 2


2
n

[
Milch,Zettlemoyer
,
Haims,K
, Kaelbling AAAI08]

counting

formula

30

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Example: Competing Workshops


30



att
(p
1
)

att
(p
2
)

att
(
p
n
)

hot(w
1
)

start series

hot(w
2
)

hot(w
m
)



Can’t sum out
attends(X)

without

joining all the
hot(W)

variables

Create counting formula on
hot(W)
,

then sum out
attends(X)

at lifted level

Conversion to counting formulas creates new
opportunities for lifted elimination



att
(p
1
)

att
(p
2
)

att
(
p
n
)

hot(w
1
)

start series

hot(w
2
)

hot(w
m
)



#
W
[hot(W)]


W

X.

(hot(W),
att
(X))

31

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Results: Competing Workshops

0
50
100
150
200
0
50
100
150
200
Time (ms)

Number of Invitees

VE

FOVE

C
-
FOVE


1


These exact inference approaches are
rather complex



so far do not easily scale to realistic
domains,



and hence have only been applied to
rather small artificial
problems

What

about

approximate

inference

approaches
?

32

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

How

do
you

spend

your

spare time?

YouTube
like media portals have changed the way
users access media content in the Internet

Every day, millions of people visit social media sites
such as
Flickr
, YouTube, and
Jumpcut
, among
others, to share their photos and videos, …

while others enjoy themselves by searching,
watching, commenting, and rating the photos and
videos; what your friends like will bear great
significance for you.

33

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

How

do
you

efficiently

broadcast

information
?

34

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Content Distribution
using

Stochastic

Policies

[
Bickson

et al. WDAS04]

Large (
distributed
)
networks
, so
Bickson

et al.
propose

to

use

(
loopy
) belief
propagation

35

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

The
Sum
-
Product

Algorithm

aka

Belief Propagation


Iterative process in which neighboring variables “talk” to each
other, passing
messages such as:


“I (variable
x3) think that you (variable x2)
belong in these states
with various
likelihoods
…”

36

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Loopy

Belief Propagation


After enough iterations, this series of conversations is
likely
to
converge to a consensus that determines the marginal
probabilities of all the variables.


Sum
-
Product
/BP

-
(1) update
messages

until

convergence


-
(2)
compute

single

node

marginals


Variants

exist

for

solving

-
SAT
problems
,

-
systems

of

linear
equations
,

-
matching

problems

and


for

abitrary

distributions

(
based

on
sampling
)

A
lot

of

shared

factors
, so
use

lifted

belief
propagation

37

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Lifted

Belief Propagation

[
Singla
, Domingos AAAI08, K, Ahmadi, Natarajan UAI09]

Counting

shared

factors

can

result

in
great

efficiency

gains

for

(
loopy
) belief
propagation

Shared

factors

appear

more

often

than

you

think

in relevant real
world

problems

identical

http://www
-
kd.iai.uni
-
bonn.de/index.php?page=software_details&id=16

38

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Social

Network Analysis

39

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Social

Network Analysis

40

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Social

Network Analysis

41

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Step

1:
Compression

http://www
-
kd.iai.uni
-
bonn.de/index.php?page=software_details&id=16

42

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

http://www
-
kd.iai.uni
-
bonn.de/index.php?page=software_details&id=16

Step

2:
Modified

Belief Propagation


43

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Lifted

Factored

Frontier

20
people

over

10 time
steps
. Max
number

of

friends

5.
Cancer

never

observed
.
Time
step

randomly

selected
.

44

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Lower

Bound

on Model Counts
of

CNF


BPCount

[
Kroc

et al 08]

-
BP
used

to

estimate

marginals

-
Provable

bound

45

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Satisfied

by

Lifted

Message
Passing
?

Model
Counting

46

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Lifted

Satisfiability

[Hadiji, K, Ahmadi StarAI10]


Warning

and
survey

propagation

can

also
be

lifted



Enables

lifted

treatment

of

both

prob. and
det
.
knowledge


47

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Lifted

Linear
Equations


Gaussian

Belief Propagation
can

also
be

lifted



Enables

lifted

page

rank, HITS,
Kalman
filters
, …


Lifted

Matching


NetAlignBP
, Min
-
Sum
, ...
can

also
be

lifted


[
O.
Shental
, et al ISIT
-
08
]

[
Bayati

et al. Trans. on Information
Theory

54(3): 1241
-
1251, 2008,

Bayati

et al. ICDM09]

Image due
to

David Gleich, Stanford

[
Ongoing

work
]

48

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Content Distribution (
Gnutella
):
Lifted

BP vs. BP

49

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Message Errors
to

the

Rescue
!

Make use of decaying message errors already at lifting time



Ihler

et al. 05:
BP message errors decay along paths



LBP
may

spuriously

assume

some nodes send and
receive different messages and, hence, produce
pessimistic lifted network

50

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Informed

Lifted

Belief Propagation

[El Massaoudi, K, Ahmadi, Hadiji AAAI10]


Iterate


Refine

Lifting


Modified

BP

51

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Social

Networks

52

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Lifted

Content Distribution


1
file
,
Gnutella

snapshort

-
10876
nodes

-

39994
edges



iLBP

4.272.164 mess.


< BP 5.761.952 mess.


< LBP 6.381.516 mess.



On a different network:

-
iLBP

1.972.662 < LBP 2.962.311 < BP 5.761.952

53

K. Kersting

Lifted Message Passing

The Center
for

Language and Speech Processing

@ JHU, Baltimore, USA Sep. 7, 2010

Conclusions


StarAI


Objects&Relations

+
Probabilities

+
Machine

Learning


It

covers

the

whole

AI
spectrum

-
Relational POMDPs
[Sanner, K, AAAI10]


Lifted
/
efficient

reasoning

crucial

to

StarAI

-
Exploit symmetries revealed by (relational) model



More
applications
?
What

about

NLP?


MLNs
composed

of

task
-
specific

sub
-
programs
?


Relational linear
programs
?


Arbitrary

distributions
?




Thanks for your attention