Deep Learning
Bing

Chen Tsai
1/21
1
outline
Neural networks
Graphical model
Belief nets
Boltzmann machine
DBN
Reference
2
Neural networks
Supervised learning
The training data consists of input information
with
their
corresponding output information.
Unsupervised learning
The training data consists of input information
without
their corresponding
output information.
3
Neural networks
Generative
model
Model the distribution of input as well as output ,P(x , y)
Discriminative
model
Model the posterior probabilities ,P(y  x)
P(x,y1)
P(x,y2)
P(y1x)
P(y2x)
4
Neural
networks
What is the neural?
Linear
neurons
Binary threshold
neurons
Sigmoid
neurons
Stochastic binary neurons
x1
x2
1
w1
w2
b
y
1 if
0 otherwise
5
Neural networks
Two layer neural networks
(
Sigmoid
neurons
)
6
Back

propagation
Step1:
Randomly initial weight
Determine the output vector
Step2:
Evaluating the gradient
of an error function
Step3:
Adjusting weight,
Repeat The step1,2,3
until error enough low
Neural networks
Back

propagation is not good for deep learning
It requires labeled training
data.
Almost data is unlabeled.
The
learning time is very slow in networks with multiple hidden
layers.
It is very slow in networks with multi hidden layer.
It can get stuck in poor local optima
.
For deep nets they are far from optimal.
L
earn
P(input)
not
P(output  input)
What kind of generative model should we learn?
7
outline
Neural networks
Graphical model
Belief nets
Boltzmann machine
DBN
Reference
8
Graphical model
A graphical model is a probabilistic model for which graph
denotes the conditional dependence structure between
random variables probabilistic model
9
In this example: D
depends on A, D depends
on B, D depends on C, C
depends on B, and C
depends on D.
Graphical model
Directed graphical
model
Undirected graphical model
10
A
B
C
D
A
B
C
D
𝑃
,
,
,
=
1
∗
φ
,
,
∗
𝜑
(
,
,
)
𝑃
,
,
,
=
𝑃
𝑃
𝑃
𝑃
(

,
)
outline
Neural networks
Graphical model
Belief nets
Boltzmann machine
DBN
Reference
11
Belief
nets
A belief net is a directed acyclic graph composed of
stochastic
variables
12
stochastic hidden causes
visible
Stochastic binary neurons
It is sigmoid belief nets
Belief nets
we would like to solve two
problems
The inference problem:
Infer the states of the unobserved variables.
The learning problem:
Adjust the interactions between variables to
make the network more likely to generate the training data.
13
stochastic hidden causes
visible
Belief nets
It is easy to generate
sample P(v  h)
Á
It is hard to
infer P(h  v)
Explaining away
14
stochastic hidden causes
visible
Belief nets
Explaining away
15
H1
H2
V
H1 and H2
are independent, but they
can become dependent
when we observe an effect that they
can both
influence
𝑃
𝐻1
𝑛
𝑃
𝐻2
𝑝 𝑛 𝑛
Belief nets
Some methods for learning deep belief
nets
Monte Carlo
methods
But its painfully slow for large, deep belief
nets
Learning with samples from the wrong
distribution
Use Restricted Boltzmann
M
achines
16
outline
Neural networks
Graphical model
Belief nets
Boltzmann machine
DBN
Reference
17
Boltzmann
Machine
It is a
Undirected graphical
model
Á
The Energy of a joint configuration
18
hidden
i
j
visible
Boltzmann
Machine
19
h1
h2
+2 +1
v1 v2

1
An example of how weights
define a distribution
Boltzmann
Machine
A very surprising fact
20
Derivative of log
probability of one
training
vector, v
under the model.
Expected value of
product of states at
thermal equilibrium
when
v is
clamped on
the visible units
Expected value of
product of states at
thermal equilibrium
with no clamping
Boltzmann
Machines
Restricted Boltzmann Machine
We restrict the connectivity to make learning easier.
Only one layer of hidden units.
We will deal with more layers later
No connections between hidden
units
Making the updates more parallel
21
visible
Boltzmann Machines
the Boltzmann machine learning algorithm for an RBM
22
i
j
i
i
j
i
j
t = 0
j
t =
1
t =
2
t =
infinity
Boltzmann Machines
Contrastive divergence: A very surprising short

cut
23
t = 0 t = 1
reconstruction
data
i
j
i
j
This is not following the gradient of the
log likelihood. But it works well.
outline
Neural networks
Graphical model
Belief nets
Boltzmann machine
DBN
Reference
24
DBN
It is easy to generate
sample P(v  h)
Á
It is hard to
infer P(h  v)
Explaining
away
Use RBM to initial weight can get good optimal
25
stochastic hidden causes
visible
DBN
Combining two RBMs to make a DBN
26
copy binary state for each v
C
ompose the
two RBM
models to
make a single
DBN model
Train this
RBM first
Then train
this RBM
It’s a deep belief nets!
DBN
Why we can use RBM to initial belief nets weights?
An infinite sigmoid belief net that is equivalent to an
RBM
Inference
in a directed net with replicated
weights
Inference is trivial. We just multiply v0 by W transpose
.
The model above h0 implements a
complementary prior.
Multiplying v0 by W transpose gives the
product
of the likelihood term and the prior term.
27
v
1
h
1
v
0
h
0
v
2
h
2
etc.
DBN
Complementary prior
A
Markov
chain
is a sequence of variables X1;X2; : : : with the Markov
property
𝑃
𝑡
1
,
…
,
𝑡
−
1
=
𝑃
(
𝑡

𝑡
−
1
)
A Markov chain is
stationary
if the transition probabilities do not
depend
on time
𝑃
𝑡
=
𝑥
′
𝑡
−
1
=
𝑥
=
𝑇
𝑥
→
𝑥
′
𝑇
(
𝑥
→
𝑥′
)
is called the
transition matrix
.
If
a Markov chain is
ergodic
it has a unique equilibrium distribution
𝑃
𝑡
𝑡
=
𝑥
→
𝑃
∞
=
𝑥
→
∞
28
X1
X2
X3
X4
DBN
Most Markov chains used in practice satisfy
detailed balance
𝑃
∞
(
)
𝑇
(
→
′
)
=
𝑃
∞
(
′
)
𝑇
(
′
→
)
e.g
. Gibbs, Metropolis

Hastings, slice sampling. .
.
Such Markov chains are
reversible
29
X1
X2
X3
X4
𝑃
∞
1
𝑇
1
→
2
𝑇
2
→
3
𝑇
(
3
→
4
)
X1
X2
X3
X4
𝑇
1
←
2
𝑇
2
←
3
𝑇
3
←
4
𝑃
∞
(
4
)
DBN
𝑃
𝑘
=
1
𝑘
+
1
=
𝜎
(
𝑇
𝑘
+
1
+
)
𝑃
𝑘
=
1
𝑘
=
𝜎
(
𝑘
+
)
30
DBN
Combining two RBMs to make a DBN
31
copy binary state for each v
C
ompose the
two RBM
models to
make a single
DBN model
Train this
RBM first
Then train
this RBM
It’s a deep belief nets!
Reference
Deep Belief
Nets,2007 NIPS
tutorial ,
G . Hinton
https
://class.coursera.org/neuralnets

2012

001/class/index
Machine learning
上課講義
http://en.wikipedia.org/wiki/Graphical_mod
el
32
Comments 0
Log in to post a comment