Factoring 3-way interactions

beeuppityΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 19 μέρες)

112 εμφανίσεις






Learning Mult
i
pli
c
ative

Intera
c
tions


many slides from Hinton


Two

different

meanin
g
s

of

“mu
l
tipl
i
cative”

multip
l
ic
a
tive

i
n
terac
t
i
o
ns

betwe
e
n

l
a
tent

variab
l
es.


If

we

t
ake

two dens
i
ty

mode
l
s

and

m
ulti
p
ly

together the
i
r
proba
b
i
l
ity

distributi
o
ns

at
e
ach po
i
nt

in

dat
a
-
spac
e
,

we
get

a “p
r
od
u
ct

of

e
x
perts

.


The

p
roduct of

two

Gaussi
a
n

e
x
perts

is a Gaussia
n
.



If

we

t
ake

two variab
l
es

and

we

m
ulti
p
ly

them together to
provide

inp
u
t

to a

t
hird

variab
l
e

we

get

a


m
ulti
p
l
i
cative
i
n
teractio
n
”.



The

d
istributi
o
n

of the

product of two

G
auss
i
a
n
-

distributed varia
b
l
e
s

is NOT Gauss
i
an

distributed. It

is
a

he
a
vy
-
tai
l
ed

distributi
o
n.

One
G
auss
i
an

determi
n
es
the

stan
d
ard

d
e
vi
a
tion

of

the

other Gauss
i
a
n
.


H
e
av
y
-
tai
l
ed

distributi
o
ns

are
t
he

si
g
natures

of

Learning

multip
l
icative

interactions

form

a b
i
-
partite graph.


It

is fairly easy to

l
e
arn

multip
l
ic
a
tive

i
n
terac
t
i
o
ns

if

all

of
the

variab
l
es

are

obs
e
rved.



Th
i
s

is poss
i
ble

if

we

control the

variab
l
es

used

t
o
create

a train
i
ng

set (e.g.

pos
e
,

l
i
ghti
n
g,

i
d
entity

…)



It

is

a
l
so

easy

to

l
earn

en
e
rg
y
-
bas
e
d

mode
l
s

in

wh
i
ch

all

but

one

of

the

ter
m
s

in

each

m
ulti
p
l
i
cative

i
n
terac
t
i
o
n

are

obs
e
rved
.


Inference

is still

easy.



If

more

than one of the terms

in

each

multip
l
ic
a
tive
i
n
teraction

are uno
b
serve
d
,

the i
n
teractio
n
s

b
e
twe
e
n
hi
d
den

variab
l
es

make

i
n
ference

difficu
l
t.


A
l
ternating Gibbs

can be used

if

the

l
a
tent

variab
l
es

High
e
r

or
d
er

Bol
t
zmann

mach
i
n
e
s

(Se
j
n
o
ws
ki,

~19
86)



The

u
sual en
e
rgy

function is qua
d
ratic

in

t
he

states:



B
u
t

we

cou
l
d

use h
i
gh
e
r

order interac
t
i
o
ns:




H
i
dd
e
n

un
i
t

h

acts as a sw
i
tch. When h is on, it
s
w
itch
e
s in the p
a
ir
w
ise

i
n
teraction

b
e
twe
e
n

u
n
it i
and

unit
j
.




U
n
its

i

and

j

c
an

a
l
so

be

vi
e
w
e
d

as

switches

that

control

the

pairw
i
se

i
n
terac
t
i
o
ns

betwe
e
n

j

a
nd

h

or

b
e
twe
e
n

i

a
n
d

h
.



=
𝑖

𝑒 𝑚
+









<




=
𝑖

𝑒 𝑚
+









ℎ

,

,


U
s
ing

hi
g
h
e
r
-
or
d
er

Bol
t
zmann

mach
i
n
e
s
t
o
mo
d
el

im
a
ge

tr
a
nsform
a
ti
o
ns

(
M
emisev
i
c and H
i
nton,

20
0
7)




A
global

tr
a
n
s
fo
r
matio
n

sp
e
cifi
es

w
hi
c
h

p
ix
e
l

g
oes

to

whi
c
h

ot
h
er

p
i
xe
l.



Con
v
e
r
se
ly,

ea
c
h

p
a
i
r

of

simil
ar

i
nte
n
s
i
ty

p
i
xe
ls,

on
e

in

e
a
c
h

i
ma
ge,

votes

for

a

p
a
r
tic
u
l
ar

g
l
o
bal

tr
a
n
s
fo
r
mation
.



image

tran
s
formation


image(t)

image(t+1)

U
s
ing

hi
g
h
e
r
-
or
d
er

Bol
t
zmann

mach
i
n
e
s
t
o
mo
d
el

im
a
ge

tr
a
nsform
a
ti
o
ns



For binary images, a simple energy function that captures all
possible correlations between the components of

,

,
𝒉

is



,
𝒉
;

=















Using

this

energy

function,

we

can

now

define

the

joint

distribution

𝑝

,
𝒉
|


over

outputs

and

hidden

variables

by

exponentiating

and

normalizing
:

𝑝

,
𝒉
|

=
1

(

)
exp

(



,
𝒉
;

)


其中




=

exp

(



,
𝒉
;

)

,
𝒉


From

Eqs
.

1

and

2
,

we

get

𝑝


|

,

=
𝜎
(









)

𝑝


|

,
𝒉
=
𝜎
(









)




(1)

(2)

Making

the

recons
t
ruction

easier


C
o
n
d
it
i
on

on

t
he

fi
r
st image

so

t
h
a
t
o
nly

o
n
e vi
s
ible

group

ne
e
ds

to be
r
ec
o
nstructe
d
.




Giv
e
n

the

hid
d
en

states

and

t
he

previ
o
us

ima
g
e, the

pi
x
els

in the

s
e
c
o
nd

ima
g
e

are
c
o
n
d
it
i
o
n
ally ind
e
p
e
n
d
e
n
t.





image

tran
s
formation

image
(
t)

image
(
t+1)

The

main

pro
b
lem with
3
-
way interactio
n
s


energy function:


=
𝑖

𝑒 𝑚
+









ℎ

,

,



T
h
ere

are far too

many

of them.


We

can

r
ed
u
ce

the numb
e
r

in se
v
eral

straigh
t
-
forward

w
a
y
s
:



Do

dim
e
n
s
ion
a
li
t
y

red
u
ction

on

ea
c
h

gro
u
p b
e
fore
the

three

w
a
y

intera
c
tion
s
.


Use

sp
a
tial

loc
a
li
t
y

to limit

the range of the thre
e
-
w
a
y

intera
c
tion
s
.



A

mu
c
h

more

intere
s
ting

a
p
pro
a
ch

(whi
c
h

c
a
n

be
c
o
mbi
n
ed

with the

oth
e
r tw
o
)

is

to

factor the
interactions

so

that they

can be s
p
ecified

with
f
ew
e
r
p
a
ram
e
ters.


T
h
is

lea
d
s

to

a n
o
v
e
l

type

of le
a
rning

mo
d
ule.

Factor
i
ng

thr
e
e
-
way

in
t
er
a
ctio
n
s


We

use

factors

that

corresp
o
nd

to

3
-
w
a
y

o
u
te
r
-

products.



=


𝑓

𝑓

ℎ𝑓
𝑓



E





s
i

s

j

s
h

w
ijh

i
,

j
,
h

u
n
factored



E







s
i

s

j

s
h

w
i
f

w

j
f

w
hf

f

i
,

j
,
h

factored

w
jf

w
hf

w
if

(
Ranzato
,
Krizhevsky

and H
i
nton,

2010)




Joint 3
-
way model


Model the covariance structure of natural images.
The visible units are two identical copies






Factored 3
-
Way Restricted Boltzmann
Machines For Modeling Natural Images


Define energy function in terms of 3
-
way
multiplicative interactions between two visible binary
units,


,


, and one hidden binary unit


:


𝒗
,
𝒉
=














Model the three
-
way weights as a sum of “factors”, f,
each of which is a three
-
way outer product



=


𝑓

𝑓
𝑃

,
𝑓
𝑓


The factors are connected twice to the same image
through matrices B and C, it is natural to tie their
weights further reducing the number of parameters:



=


𝑓

𝑓
𝑃

,
𝑓
𝑓






A powerful module for deep learning


So the energy function becomes:


𝒗
,
𝒉
=


(




𝑓

)
2
(



𝑃
𝑓

)
𝑓


The parameters of the model can be learned by maximizing
the log likelihood, whose gradient is given by:


𝐿


=
<




>
model

<




>
data


The hidden units conditionally independent given the states of
the visible units, and their binary states can be sampled
using:

𝑝


|

,

=
𝜎
(

𝑃
𝑓




𝑓

2
𝑓
+


)


However, given the hidden states, the visible units are no
longer independent.





A powerful module for deep learning

Producing reconstructions using
hybrid Monte Carlo


Integrate out the hidden units and use the hybrid
M
onte Carlo algorithm(HMC) on free energy:


𝒗
=


log

(
1
+
exp

(

𝑃
𝑓




𝑓

2
𝑓
+


)
)







(
H
i
nton et al.,

2011)




describe a generative model of the relationship
between two images


The model is defined as a factored three
-
way
Boltzmann machine, in which hidden variables
collaborate to define the joint correlation matrix for
image pairs





Modeling the joint density of two images under
a variety of
tranformations


Given two real
-
valued images


and

, define the
matching score of triplets

,

,
𝒉
:

S

,

,
𝒉
=

(


𝑓





=
1
)
(


𝑓





=
1
)
(


𝑓





=
1
)
𝐹
𝑓
=
1


Add bias terms to matching score and get energy
function:



E

,

,
𝒉
=

S

,

,
𝒉











=
1
+
1
2

(





)
2


=
1
+
1
2

(





)
2


=
1

(1
)


Exponentiate

and normalize energy function:

p

,

,
𝒉
=
1

exp

(

E

,

,
𝒉
)







Model


Marginalize over
𝒉

to get distribution over an
image pair

,

:

p

,

=

p

,

,
𝒉
𝒉

{
0
,
1
}
𝐾


And the we can get

𝑝

|

,

=

bernoulli
(


+


𝑓
𝑓


𝑓





𝑓



)


(3)

𝑝

|

,

=

𝒩
(


+


𝑓
𝑓


𝑓





𝑓



;
1
.
0
)


(4)

𝑝

|

,

=

𝒩
(


+


𝑓
𝑓


𝑓





𝑓



;
1
.
0
)


(5
)



This shows that among the three sets of variables,
computation of the conditional distribution of any
one group

,

,
or

𝒉
, given the other two, is
tractable.







Model

先决条件
:数据集
(

𝛼
,

𝛼
)
𝛼
=
1
𝑁

学习率


repeat


for

𝛼

from

1

to

𝑁

do


计算


=


𝛼



=


𝛼





=
𝑝
(


|

𝛼
,

𝛼
)

for

each

𝑘


执行
正阶段更新



=

+




𝑇
,


=

+




𝑇
,





=

+




𝑇
,





=

+


𝛼
,

=

+


𝛼
,


=

+
ℎ
,




𝑝
(

|

𝛼
,

𝛼
)
中采样





bernoulli
(
0
:
5
)
中采样
𝑔


if

𝑔
>
0
.
5

then



𝑝
(

|

𝛼
,


)
中采样






=

𝑇





𝑝
(

|

𝛼
,


)
中采样






=

𝑇




else



𝑝


𝛼
,


中采样






=

𝑇





𝑝
(

|

𝛼
,


)
中采样






=

𝑇





End

if






=
𝑝
(


|


,


)

for

each

𝑘


计算


=
ℎ


执行负阶段更新



=







𝑇
,


=







𝑇
,



=






𝑇
,





=





,

=





,


=


ℎ
,



重新正则化

,

,

,

,

,



end

for

until

达到收敛条件

Three
-
way


contrastive Divergence

Thank you