Deep Learning

wonderfuldistinctΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 4 χρόνια και 23 μέρες)

64 εμφανίσεις

Deep Learning

Bing
-
Chen Tsai

1/21

1

outline


Neural networks


Graphical model


Belief nets


Boltzmann machine


DBN


Reference


2

Neural networks


Supervised learning


The training data consists of input information
with

their
corresponding output information.


Unsupervised learning


The training data consists of input information
without

their corresponding
output information.


3

Neural networks


Generative
model


Model the distribution of input as well as output ,P(x , y)


Discriminative
model


Model the posterior probabilities ,P(y | x)


P(x,y1)

P(x,y2)

P(y1|x)

P(y2|x)

4

Neural
networks


What is the neural?



Linear
neurons



Binary threshold
neurons



Sigmoid
neurons



Stochastic binary neurons


x1

x2

1

w1

w2

b

y

1 if

0 otherwise

5

Neural networks


Two layer neural networks

(
Sigmoid
neurons
)

6

Back
-
propagation

Step1:

Randomly initial weight

Determine the output vector

Step2:

Evaluating the gradient

of an error function

Step3:

Adjusting weight,

Repeat The step1,2,3

until error enough low

Neural networks


Back
-
propagation is not good for deep learning


It requires labeled training
data.


Almost data is unlabeled.


The
learning time is very slow in networks with multiple hidden
layers.


It is very slow in networks with multi hidden layer.


It can get stuck in poor local optima
.


For deep nets they are far from optimal.


L
earn
P(input)

not
P(output | input)


What kind of generative model should we learn?

7

outline


Neural networks


Graphical model


Belief nets


Boltzmann machine


DBN


Reference


8

Graphical model


A graphical model is a probabilistic model for which graph
denotes the conditional dependence structure between
random variables probabilistic model

9


In this example: D
depends on A, D depends
on B, D depends on C, C
depends on B, and C
depends on D.

Graphical model


Directed graphical
model






Undirected graphical model





10

A

B

C

D

A

B

C

D

𝑃

,

,

,

=
1


φ

,

,


𝜑
(

,

,

)

𝑃

,

,

,

=
𝑃

𝑃


𝑃


𝑃
(

|

,

)

outline


Neural networks


Graphical model


Belief nets


Boltzmann machine


DBN


Reference


11

Belief
nets


A belief net is a directed acyclic graph composed of
stochastic
variables


12

stochastic hidden causes

visible

Stochastic binary neurons

It is sigmoid belief nets

Belief nets


we would like to solve two
problems


The inference problem:

Infer the states of the unobserved variables.


The learning problem:

Adjust the interactions between variables to
make the network more likely to generate the training data.


13

stochastic hidden causes

visible

Belief nets


It is easy to generate
sample P(v | h)

Á
It is hard to
infer P(h | v)


Explaining away

14

stochastic hidden causes

visible

Belief nets


Explaining away


15

H1

H2

V

H1 and H2

are independent, but they
can become dependent

when we observe an effect that they
can both
influence

𝑃
𝐻1


𝑛

𝑃
𝐻2




 𝑝 𝑛 𝑛

Belief nets


Some methods for learning deep belief
nets


Monte Carlo
methods


But its painfully slow for large, deep belief
nets



Learning with samples from the wrong
distribution


Use Restricted Boltzmann
M
achines

16

outline


Neural networks


Graphical model


Belief nets


Boltzmann machine


DBN


Reference


17

Boltzmann
Machine


It is a
Undirected graphical
model

Á
The Energy of a joint configuration


18

hidden

i

j

visible

Boltzmann
Machine

19




h1
h2



+2 +1


v1 v2

-
1

An example of how weights
define a distribution

Boltzmann
Machine


A very surprising fact

20

Derivative of log
probability of one
training
vector, v
under the model.

Expected value of
product of states at
thermal equilibrium
when
v is
clamped on
the visible units

Expected value of
product of states at
thermal equilibrium
with no clamping

Boltzmann
Machines


Restricted Boltzmann Machine


We restrict the connectivity to make learning easier.


Only one layer of hidden units.


We will deal with more layers later


No connections between hidden
units



Making the updates more parallel


21

visible

Boltzmann Machines


the Boltzmann machine learning algorithm for an RBM

22

i

j

i

i

j

i

j

t = 0

j

t =
1

t =
2

t =
infinity

Boltzmann Machines


Contrastive divergence: A very surprising short
-
cut

23

t = 0 t = 1

reconstruction

data

i

j

i

j

This is not following the gradient of the
log likelihood. But it works well.

outline


Neural networks


Graphical model


Belief nets


Boltzmann machine


DBN


Reference


24

DBN


It is easy to generate
sample P(v | h)

Á
It is hard to
infer P(h | v)


Explaining
away









Use RBM to initial weight can get good optimal

25

stochastic hidden causes

visible

DBN


Combining two RBMs to make a DBN

26

copy binary state for each v

C
ompose the
two RBM
models to
make a single
DBN model

Train this
RBM first

Then train
this RBM

It’s a deep belief nets!

DBN


Why we can use RBM to initial belief nets weights?


An infinite sigmoid belief net that is equivalent to an
RBM





Inference
in a directed net with replicated
weights


Inference is trivial. We just multiply v0 by W transpose
.


The model above h0 implements a
complementary prior.


Multiplying v0 by W transpose gives the



product

of the likelihood term and the prior term.


27


v
1


h
1


v
0


h
0


v
2


h
2

etc.

DBN


Complementary prior




A
Markov
chain
is a sequence of variables X1;X2; : : : with the Markov
property

𝑃

𝑡

1

,

,

𝑡

1
=
𝑃
(

𝑡
|

𝑡

1
)


A Markov chain is
stationary
if the transition probabilities do not


depend
on time

𝑃

𝑡
=
𝑥


𝑡

1
=
𝑥
=
𝑇
𝑥

𝑥



𝑇
(
𝑥

𝑥′
)

is called the
transition matrix
.


If
a Markov chain is
ergodic

it has a unique equilibrium distribution

𝑃
𝑡

𝑡
=
𝑥

𝑃


=
𝑥












28

X1

X2

X3

X4

DBN


Most Markov chains used in practice satisfy
detailed balance

𝑃

(

)
𝑇
(



′
)
=
𝑃

(


)
𝑇
(






)


e.g
. Gibbs, Metropolis
-
Hastings, slice sampling. .
.


Such Markov chains are
reversible

29

X1

X2

X3

X4

𝑃


1
𝑇

1


2
𝑇

2


3
𝑇
(

3


4
)

X1

X2

X3

X4

𝑇

1


2
𝑇

2


3
𝑇

3


4
𝑃

(

4
)

DBN

𝑃

𝑘
=
1

𝑘
+
1
=
𝜎
(

𝑇

𝑘
+
1
+

)

𝑃

𝑘
=
1

𝑘
=
𝜎
(


𝑘
+

)

30

DBN


Combining two RBMs to make a DBN

31

copy binary state for each v

C
ompose the
two RBM
models to
make a single
DBN model

Train this
RBM first

Then train
this RBM

It’s a deep belief nets!

Reference


Deep Belief
Nets,2007 NIPS

tutorial ,



G . Hinton


https
://class.coursera.org/neuralnets
-
2012
-
001/class/index


Machine learning
上課講義



http://en.wikipedia.org/wiki/Graphical_mod
el

32