Deep Learning

AI and Robotics

Oct 16, 2013 (4 years and 8 months ago)

83 views

Deep Learning

Bing
-
Chen Tsai

1/21

1

outline

Neural networks

Graphical model

Belief nets

Boltzmann machine

DBN

Reference

2

Neural networks

Supervised learning

The training data consists of input information
with

their
corresponding output information.

Unsupervised learning

The training data consists of input information
without

their corresponding
output information.

3

Neural networks

Generative
model

Model the distribution of input as well as output ,P(x , y)

Discriminative
model

Model the posterior probabilities ,P(y | x)

P(x,y1)

P(x,y2)

P(y1|x)

P(y2|x)

4

Neural
networks

What is the neural?

Linear
neurons

Binary threshold
neurons

Sigmoid
neurons

Stochastic binary neurons

x1

x2

1

w1

w2

b

y

1 if

0 otherwise

5

Neural networks

Two layer neural networks

(
Sigmoid
neurons
)

6

Back
-
propagation

Step1:

Randomly initial weight

Determine the output vector

Step2:

of an error function

Step3:

Repeat The step1,2,3

until error enough low

Neural networks

Back
-
propagation is not good for deep learning

It requires labeled training
data.

Almost data is unlabeled.

The
learning time is very slow in networks with multiple hidden
layers.

It is very slow in networks with multi hidden layer.

It can get stuck in poor local optima
.

For deep nets they are far from optimal.

L
earn
P(input)

not
P(output | input)

What kind of generative model should we learn?

7

outline

Neural networks

Graphical model

Belief nets

Boltzmann machine

DBN

Reference

8

Graphical model

A graphical model is a probabilistic model for which graph
denotes the conditional dependence structure between
random variables probabilistic model

9

In this example: D
depends on A, D depends
on B, D depends on C, C
depends on B, and C
depends on D.

Graphical model

Directed graphical
model

Undirected graphical model

10

A

B

C

D

A

B

C

D

𝑃

,

,

,

=
1


φ

,

,


𝜑
(

,

,

)

𝑃

,

,

,

=
𝑃

𝑃


𝑃


𝑃
(

|

,

)

outline

Neural networks

Graphical model

Belief nets

Boltzmann machine

DBN

Reference

11

Belief
nets

A belief net is a directed acyclic graph composed of
stochastic
variables

12

stochastic hidden causes

visible

Stochastic binary neurons

It is sigmoid belief nets

Belief nets

we would like to solve two
problems

The inference problem:

Infer the states of the unobserved variables.

The learning problem:

Adjust the interactions between variables to
make the network more likely to generate the training data.

13

stochastic hidden causes

visible

Belief nets

It is easy to generate
sample P(v | h)

Á
It is hard to
infer P(h | v)

Explaining away

14

stochastic hidden causes

visible

Belief nets

Explaining away

15

H1

H2

V

H1 and H2

are independent, but they
can become dependent

when we observe an effect that they
can both
influence

𝑃
𝐻1

𝑛

𝑃
𝐻2



 𝑝 𝑛 𝑛

Belief nets

Some methods for learning deep belief
nets

Monte Carlo
methods

But its painfully slow for large, deep belief
nets

Learning with samples from the wrong
distribution

Use Restricted Boltzmann
M
achines

16

outline

Neural networks

Graphical model

Belief nets

Boltzmann machine

DBN

Reference

17

Boltzmann
Machine

It is a
Undirected graphical
model

Á
The Energy of a joint configuration

18

hidden

i

j

visible

Boltzmann
Machine

19

h1
h2

+2 +1

v1 v2

-
1

An example of how weights
define a distribution

Boltzmann
Machine

A very surprising fact

20

Derivative of log
probability of one
training
vector, v
under the model.

Expected value of
product of states at
thermal equilibrium
when
v is
clamped on
the visible units

Expected value of
product of states at
thermal equilibrium
with no clamping

Boltzmann
Machines

Restricted Boltzmann Machine

We restrict the connectivity to make learning easier.

Only one layer of hidden units.

We will deal with more layers later

No connections between hidden
units

21

visible

Boltzmann Machines

the Boltzmann machine learning algorithm for an RBM

22

i

j

i

i

j

i

j

t = 0

j

t =
1

t =
2

t =
infinity

Boltzmann Machines

Contrastive divergence: A very surprising short
-
cut

23

t = 0 t = 1

reconstruction

data

i

j

i

j

This is not following the gradient of the
log likelihood. But it works well.

outline

Neural networks

Graphical model

Belief nets

Boltzmann machine

DBN

Reference

24

DBN

It is easy to generate
sample P(v | h)

Á
It is hard to
infer P(h | v)

Explaining
away

Use RBM to initial weight can get good optimal

25

stochastic hidden causes

visible

DBN

Combining two RBMs to make a DBN

26

copy binary state for each v

C
ompose the
two RBM
models to
make a single
DBN model

Train this
RBM first

Then train
this RBM

It’s a deep belief nets!

DBN

Why we can use RBM to initial belief nets weights?

An infinite sigmoid belief net that is equivalent to an
RBM

Inference
in a directed net with replicated
weights

Inference is trivial. We just multiply v0 by W transpose
.

The model above h0 implements a
complementary prior.

Multiplying v0 by W transpose gives the

product

of the likelihood term and the prior term.

27

v
1

h
1

v
0

h
0

v
2

h
2

etc.

DBN

Complementary prior

A
Markov
chain
is a sequence of variables X1;X2; : : : with the Markov
property

𝑃

𝑡

1

,

,

𝑡

1
=
𝑃
(

𝑡
|

𝑡

1
)

A Markov chain is
stationary
if the transition probabilities do not

depend
on time

𝑃

𝑡
=
𝑥


𝑡

1
=
𝑥
=
𝑇
𝑥

𝑥

𝑇
(
𝑥

𝑥′
)

is called the
transition matrix
.

If
a Markov chain is
ergodic

it has a unique equilibrium distribution

𝑃
𝑡

𝑡
=
𝑥

𝑃


=
𝑥





28

X1

X2

X3

X4

DBN

Most Markov chains used in practice satisfy
detailed balance

𝑃

(

)
𝑇
(


′
)
=
𝑃

(


)
𝑇
(



)

e.g
. Gibbs, Metropolis
-
Hastings, slice sampling. .
.

Such Markov chains are
reversible

29

X1

X2

X3

X4

𝑃


1
𝑇

1


2
𝑇

2


3
𝑇
(

3


4
)

X1

X2

X3

X4

𝑇

1


2
𝑇

2


3
𝑇

3


4
𝑃

(

4
)

DBN

𝑃

𝑘
=
1

𝑘
+
1
=
𝜎
(

𝑇

𝑘
+
1
+

)

𝑃

𝑘
=
1

𝑘
=
𝜎
(


𝑘
+

)

30

DBN

Combining two RBMs to make a DBN

31

copy binary state for each v

C
ompose the
two RBM
models to
make a single
DBN model

Train this
RBM first

Then train
this RBM

It’s a deep belief nets!

Reference

Deep Belief
Nets,2007 NIPS

tutorial ,

G . Hinton

https
://class.coursera.org/neuralnets
-
2012
-
001/class/index

Machine learning

http://en.wikipedia.org/wiki/Graphical_mod
el

32