# Factoring 3-way interactions

Τεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 8 μήνες)

128 εμφανίσεις

Learning Mult
i
pli
c
ative

Intera
c
tions

many slides from Hinton

Two

different

meanin
g
s

of

“mu
l
tipl
i
cative”

multip
l
ic
a
tive

i
n
terac
t
i
o
ns

betwe
e
n

l
a
tent

variab
l
es.

If

we

t
ake

two dens
i
ty

mode
l
s

and

m
ulti
p
ly

together the
i
r
proba
b
i
l
ity

distributi
o
ns

at
e
ach po
i
nt

in

dat
a
-
spac
e
,

we
get

a “p
r
od
u
ct

of

e
x
perts

.

The

p
roduct of

two

Gaussi
a
n

e
x
perts

is a Gaussia
n
.

If

we

t
ake

two variab
l
es

and

we

m
ulti
p
ly

them together to
provide

inp
u
t

to a

t
hird

variab
l
e

we

get

a

m
ulti
p
l
i
cative
i
n
teractio
n
”.

The

d
istributi
o
n

of the

product of two

G
auss
i
a
n
-

distributed varia
b
l
e
s

is NOT Gauss
i
an

distributed. It

is
a

he
a
vy
-
tai
l
ed

distributi
o
n.

One
G
auss
i
an

determi
n
es
the

stan
d
ard

d
e
vi
a
tion

of

the

other Gauss
i
a
n
.

H
e
av
y
-
tai
l
ed

distributi
o
ns

are
t
he

si
g
natures

of

Learning

multip
l
icative

interactions

form

a b
i
-
partite graph.

It

is fairly easy to

l
e
arn

multip
l
ic
a
tive

i
n
terac
t
i
o
ns

if

all

of
the

variab
l
es

are

obs
e
rved.

Th
i
s

is poss
i
ble

if

we

control the

variab
l
es

used

t
o
create

a train
i
ng

set (e.g.

pos
e
,

l
i
ghti
n
g,

i
d
entity

…)

It

is

a
l
so

easy

to

l
earn

en
e
rg
y
-
bas
e
d

mode
l
s

in

wh
i
ch

all

but

one

of

the

ter
m
s

in

each

m
ulti
p
l
i
cative

i
n
terac
t
i
o
n

are

obs
e
rved
.

Inference

is still

easy.

If

more

than one of the terms

in

each

multip
l
ic
a
tive
i
n
teraction

are uno
b
serve
d
,

the i
n
teractio
n
s

b
e
twe
e
n
hi
d
den

variab
l
es

make

i
n
ference

difficu
l
t.

A
l
ternating Gibbs

can be used

if

the

l
a
tent

variab
l
es

High
e
r

or
d
er

Bol
t
zmann

mach
i
n
e
s

(Se
j
n
o
ws
ki,

~19
86)

The

u
sual en
e
rgy

function is qua
d
ratic

in

t
he

states:

B
u
t

we

cou
l
d

use h
i
gh
e
r

order interac
t
i
o
ns:

H
i
dd
e
n

un
i
t

h

acts as a sw
i
tch. When h is on, it
s
w
itch
e
s in the p
a
ir
w
ise

i
n
teraction

b
e
twe
e
n

u
n
it i
and

unit
j
.

U
n
its

i

and

j

c
an

a
l
so

be

vi
e
w
e
d

as

switches

that

control

the

pairw
i
se

i
n
terac
t
i
o
ns

betwe
e
n

j

a
nd

h

or

b
e
twe
e
n

i

a
n
d

h
.

=
𝑖

𝑒 𝑚
+








<


=
𝑖

𝑒 𝑚
+








ℎ

,

,

U
s
ing

hi
g
h
e
r
-
or
d
er

Bol
t
zmann

mach
i
n
e
s
t
o
mo
d
el

im
a
ge

tr
a
nsform
a
ti
o
ns

(
M
emisev
i
c and H
i
nton,

20
0
7)

A
global

tr
a
n
s
fo
r
matio
n

sp
e
cifi
es

w
hi
c
h

p
ix
e
l

g
oes

to

whi
c
h

ot
h
er

p
i
xe
l.

Con
v
e
r
se
ly,

ea
c
h

p
a
i
r

of

simil
ar

i
nte
n
s
i
ty

p
i
xe
ls,

on
e

in

e
a
c
h

i
ma
ge,

for

a

p
a
r
tic
u
l
ar

g
l
o
bal

tr
a
n
s
fo
r
mation
.

image

tran
s
formation

image(t)

image(t+1)

U
s
ing

hi
g
h
e
r
-
or
d
er

Bol
t
zmann

mach
i
n
e
s
t
o
mo
d
el

im
a
ge

tr
a
nsform
a
ti
o
ns

For binary images, a simple energy function that captures all
possible correlations between the components of

,

,
𝒉

is


,
𝒉
;

=












Using

this

energy

function,

we

can

now

define

the

joint

distribution

𝑝

,
𝒉
|

over

outputs

and

hidden

variables

by

exponentiating

and

normalizing
:

𝑝

,
𝒉
|

=
1

(

)
exp

(


,
𝒉
;

)

=

exp

(


,
𝒉
;

)

,
𝒉

From

Eqs
.

1

and

2
,

we

get

𝑝


|

,

=
𝜎
(









)

𝑝


|

,
𝒉
=
𝜎
(








)

(1)

(2)

Making

the

recons
t
ruction

easier

C
o
n
d
it
i
on

on

t
he

fi
r
st image

so

t
h
a
t
o
nly

o
n
e vi
s
ible

group

ne
e
ds

to be
r
ec
o
nstructe
d
.

Giv
e
n

the

hid
d
en

states

and

t
he

previ
o
us

ima
g
e, the

pi
x
els

in the

s
e
c
o
nd

ima
g
e

are
c
o
n
d
it
i
o
n
ally ind
e
p
e
n
d
e
n
t.

image

tran
s
formation

image
(
t)

image
(
t+1)

The

main

pro
b
lem with
3
-
way interactio
n
s

energy function:

=
𝑖

𝑒 𝑚
+








ℎ

,

,

T
h
ere

are far too

many

of them.

We

can

r
ed
u
ce

the numb
e
r

in se
v
eral

straigh
t
-
forward

w
a
y
s
:

Do

dim
e
n
s
ion
a
li
t
y

red
u
ction

on

ea
c
h

gro
u
p b
e
fore
the

three

w
a
y

intera
c
tion
s
.

Use

sp
a
tial

loc
a
li
t
y

to limit

the range of the thre
e
-
w
a
y

intera
c
tion
s
.

A

mu
c
h

more

intere
s
ting

a
p
pro
a
ch

(whi
c
h

c
a
n

be
c
o
mbi
n
ed

with the

oth
e
r tw
o
)

is

to

factor the
interactions

so

that they

can be s
p
ecified

with
f
ew
e
r
p
a
ram
e
ters.

T
h
is

lea
d
s

to

a n
o
v
e
l

type

of le
a
rning

mo
d
ule.

Factor
i
ng

thr
e
e
-
way

in
t
er
a
ctio
n
s

We

use

factors

that

corresp
o
nd

to

3
-
w
a
y

o
u
te
r
-

products.



=


𝑓

𝑓

ℎ𝑓
𝑓

E

s
i

s

j

s
h

w
ijh

i
,

j
,
h

u
n
factored

E

s
i

s

j

s
h

w
i
f

w

j
f

w
hf

f

i
,

j
,
h

factored

w
jf

w
hf

w
if

(
Ranzato
,
Krizhevsky

and H
i
nton,

2010)

Joint 3
-
way model

Model the covariance structure of natural images.
The visible units are two identical copies

Factored 3
-
Way Restricted Boltzmann
Machines For Modeling Natural Images

Define energy function in terms of 3
-
way
multiplicative interactions between two visible binary
units,


,


, and one hidden binary unit


:

𝒗
,
𝒉
=












Model the three
-
way weights as a sum of “factors”, f,
each of which is a three
-
way outer product



=


𝑓

𝑓
𝑃

,
𝑓
𝑓

The factors are connected twice to the same image
through matrices B and C, it is natural to tie their
weights further reducing the number of parameters:



=


𝑓

𝑓
𝑃

,
𝑓
𝑓

A powerful module for deep learning

So the energy function becomes:

𝒗
,
𝒉
=

(




𝑓

)
2
(


𝑃
𝑓

)
𝑓

The parameters of the model can be learned by maximizing
the log likelihood, whose gradient is given by:

𝐿


=
<


>
model

<


>
data

The hidden units conditionally independent given the states of
the visible units, and their binary states can be sampled
using:

𝑝


|

,

=
𝜎
(

𝑃
𝑓




𝑓

2
𝑓
+


)

However, given the hidden states, the visible units are no
longer independent.

A powerful module for deep learning

Producing reconstructions using
hybrid Monte Carlo

Integrate out the hidden units and use the hybrid
M
onte Carlo algorithm(HMC) on free energy:


𝒗
=

log

(
1
+
exp

(

𝑃
𝑓




𝑓

2
𝑓
+


)
)


(
H
i
nton et al.,

2011)

describe a generative model of the relationship
between two images

The model is defined as a factored three
-
way
Boltzmann machine, in which hidden variables
collaborate to define the joint correlation matrix for
image pairs

Modeling the joint density of two images under
a variety of
tranformations

Given two real
-
valued images

and

, define the
matching score of triplets

,

,
𝒉
:

S

,

,
𝒉
=

(


𝑓


=
1
)
(


𝑓





=
1
)
(


𝑓





=
1
)
𝐹
𝑓
=
1

Add bias terms to matching score and get energy
function:

E

,

,
𝒉
=

S

,

,
𝒉








=
1
+
1
2

(




)
2

=
1
+
1
2

(





)
2


=
1

(1
)

Exponentiate

and normalize energy function:

p

,

,
𝒉
=
1

exp

(

E

,

,
𝒉
)

Model

Marginalize over
𝒉

to get distribution over an
image pair

,

:

p

,

=

p

,

,
𝒉
𝒉

{
0
,
1
}
𝐾

And the we can get

𝑝

|

,

=

bernoulli
(


+


𝑓
𝑓


𝑓



𝑓



)


(3)

𝑝

|

,

=

𝒩
(


+


𝑓
𝑓


𝑓





𝑓



;
1
.
0
)

(4)

𝑝

|

,

=

𝒩
(


+


𝑓
𝑓


𝑓



𝑓



;
1
.
0
)


(5
)

This shows that among the three sets of variables,
computation of the conditional distribution of any
one group

,

,
or

𝒉
, given the other two, is
tractable.

Model

：数据集
(

𝛼
,

𝛼
)
𝛼
=
1
𝑁



repeat

for

𝛼

from

1

to

𝑁

do

=


𝛼


=


𝛼


=
𝑝
(


|

𝛼
,

𝛼
)

for

each

𝑘

=

+



𝑇
,


=

+




𝑇
,


=

+


𝑇
,


=

+


𝛼
,

=

+


𝛼
,


=

+
ℎ
,

𝑝
(

|

𝛼
,

𝛼
)

bernoulli
(
0
:
5
)

𝑔

if

𝑔
>
0
.
5

then

𝑝
(

|

𝛼
,

)




=

𝑇


𝑝
(

|

𝛼
,

)



=

𝑇


else

𝑝


𝛼
,



=

𝑇


𝑝
(

|

𝛼
,

)




=

𝑇


End

if


=
𝑝
(


|


,


)

for

each

𝑘

=
ℎ

=





𝑇
,


=







𝑇
,


=




𝑇
,


=





,

=





,


=


ℎ
,

,

,

,

,

,


end

for

until

Three
-
way

contrastive Divergence

Thank you