Masoud
Saeed
Shiraz University
Neural Networks Course
1391
Theoretical results suggest that in order to learn
the kind of
complicated functions
that can
represent high

level
abstractions
one may need
Deep
architectures.
Deep architectures are composed of
multiple
levels
of
non

linear operations
, such as in neural
nets with
many
hidden layers
Searching the
parameter space of
deep
architectures
is a
difficult
task
learning
algorithms such
as those
for
Deep
Belief Networks
have recently been proposed to
tackle this problem with notable
success
We do not have
enough
formalized prior knowledge
about
the world to explain
the observed
variety of images, even for such
an apparently simple abstraction
as
MAN
A high

level abstraction such as
MAN
has the property that it
corresponds to a
very large set of
possible images
,
which might be
very different from each other
Many
lower level
and
intermediate
level concepts
(abstractions)
would
be useful
to
construct a
MAN

Detector
Lower level
abstractions (more
abstract
)
are
more directly
tied to
particular percepts
We do not know exactly
how to build
robust
MAN
detectors or even
intermediate abstractions
that
would
be
appropriate
The
number of visual and semantic categories (such as
MAN
) that we
would like
an
intelligent machine
to
capture is large
The focus of
Deep
architecture learning
is to
automatically
discover
such abstractions,
from the
lowest
level features to the
highest
level
concepts
Ideally, we would
like learning
algorithms that enable this
discovery with as little
human effort
as
possible i.e. :
without having
to manually
define all necessary
abstractions
without
having
to provide a huge set of relevant hand

labeled examples
If these algorithms could tap into the huge
resource of text and images on the web, it
would certainly help
to transfer
much of
human
knowledge
into
machine

interpretable form
.
functions
learned should have
a
structure
composed of
multiple levels
, analogous to the
multiple levels of
abstraction
that
humans
naturally envision
when they
describe an aspect
of their world
. The arguments
rest both on
:
intuition
theoretical results
about the representational
limitations
of
functions defined with an insufficient
number of levels
Family
of more recently proposed
learning
algorithms
that have been very successful to
train
deep architectures
:
Deep Belief Networks
(
DBNs
)
(
Hinton et al. (
2006)
Stacked
Autoassociators
(
Bengio
et al, 2007;
Ranzato
et al,
2007
).
DBNs
are based on
:
Restricted
Boltzmann Machines (
RBMs
)
Contrastive
Divergence
Algorithm
(Hinton, 2002),
[
Hinton06
]
showed
that
RBMs
can be
stacked
and
trained
in a greedy manner to form so

called Deep
Belief Networks (
DBN
).
They model the joint distribution between
observed vector and the hidden layers as
follows:
where
x
=
β
0
,
π
β
β
1
β
)
is a
conditional
distribution
for the
visible units
conditioned on the
hidden units
of the
RBM
at level k, and
π
(
β
β
1
,
β
)
is
the
visible

hidden joint distribution
in the top

level
RBM
.
A deep generative model can be obtained by
stacking
RBMs
.
Each layer
of
RBMs
models
more
and more
abstract features
.
Each time an
RBM
is added to the
stack
A
DBN
can be viewed as a
composition
of
simple
learning
modules
Each
of which is
a
restricted
type
of
Boltzmann machine
that contains a layer
of
visible units
that represent the
data
and a layer
of
hidden units
that
learn
to represent
features
that capture higher

order correlations in the data.
The
two layers are connected by a matrix of
symmetrically
weighted
connections
W
, and there
are no connections within a layer.
Deep Belief Networks
(
DBNs
) are
Deep
neural networks,
trained
in a greedy layer

wise
fashion.
probabilistic
generative models
that are composed of
multiple layers of
stochastic, latent variables
.
Each layer of the network tries to
model
the
distribution of its input, using
unsupervised
training
in a
Restricted Boltzmann
Machine
(
RBMs
)
The
unsupervised
greedy layer

wise training
serves
as
initialization
, replacing the traditional
random initialization of multi

layer networks
.
The
latent variables
typically have
binary values
and are often called
hidden units
or
feature
detectors
.
Learned
one layer
at a
time
When
the values of the
latent variables
in one layer
inferred from
data, Treating them
as the
data for training
the next layer
.
The top two layers
have
undirected, symmetric
connections between
them and form an
associative memory.
The
lower layers
receive
top

down,
directed
connections
from the
layer above. The states of
the units in the lowest
layer represent a data
vector.
Given
a vector of
activities
v
for the
visible units
,
the
hidden units
are all
conditionally independent
so it is easy
to sample a
vector h
from the
hidden
vectors
,
π
β
π£
,
π
)
.
It
is
easy
to sample
from
π
π£
β
,
π
)
.
By
starting
with an
observed data vector
on the
visible units
and alternating several times between sampling
from
π
β
π£
,
π
)
and
π
π£
β
,
π
)
, it is easy to get
a
learning signal
.
This
learning
signal is simply the
difference
between the
pairwise correlations of the visible
and hidden units
at the
beginning
and
end
of the
sampling
(
see Boltzmann machine for details
).
1.
Learn
an
RBM
and put it on the top.
2.
Filter
the data through the current deep architecture.
3.
Learn
an
RBM
using the filtered data and put it on the
top.
4.
Filter
the data through the current deep architecture.
5.
Repeat
3 and 4 until
n
RBMs
have been stacked
.
To filter the data through the deep architecture we just propagate
expectations up in the
RBMs
, conditioning to the original data
.
We sample from the deep architecture by sampling from the top
RBM
and then sampling the remaining
RBMs
down.
The principle of greedy
layer

wise unsupervised
training can be applied
to
DBNs
with
RBMs
as
the building blocks for
each
layer.
The process
is as follows
:
1.
Train
the
first layer
as an
RBM
that models the
raw input
as its
visible layer
.
2.
Use
that
first layer
to obtain a
representation
of the
input that will be used as
data
for the
second layer.
( samples
π
β
1
β
0
)
or mean
activations
π
β
1
=
1
β
0
)
)
3)
Train
the
second layer
as an
RBM
, taking the
transformed data
(samples or mean activations)
as
training examples
(for the
visible layer
of that
RBM
).
4)
Iterate
(2 and 3) for the
desired number of layers
,
each time propagating upward either samples or
mean values.
5)
Fine

tune
all the
parameters
of this
deep
architecture
π
π
)
Posterior
of the first
RBM
π
π
)
Probability
of
the same Layer
After the greedy
algorithm,
the recognition and generative
weights
can
be fine

tuned by means of a
Wake

Sleep
method
or
a
back

propagation
technique
.
Wake

Sleep
The
original data
is
propagated up
, the top
RBM
iterates a
few
times
and then a
sample
is
propagated down
. The
weigths
are
updated
so that the sample of the
DBN
matches the original data
.
Back

Propagation
The
network is
unfold
to produce encoder and decoder
networks.
Stochastic
activities are replaced by deterministic probabilities
and
the
weights are updated by back

propagation for
optimal
reconstruction
.
A
DBN
(Hinton et al., 2006)
with
layers models the joint
distribution
between
observed
vector
x
and
hidden layers
β
as
follows
:
where
x
=
β
0
,
π
β
β
1
β
)
is a
conditional
distribution
for visible hidden units in an
RBM
associated
with level
of the
DBN
π
(
β
β
1
,
β
)
is the visible

hidden
joint distribution
in the top

level
RBM
High level feature
extraction
Non

linear dimensionality reduction:
MNIST
Olivetti
faces.
Digits Recognition
MNIST
example from Hintonβs web
.
Generating and recognizing images
Video sequences
Motion

capture data
Very fast retrieval of documents or images
Description
β’
400
faces in 24
Γ
24
bitmap
images
.
β’
Gray scale images.
β’
Pixel
intesities
(0 β 255)
are
normalized to lie
in
the
[0, 1
] interval
.
β’
5 transformations
increase
the set size to
1600
images.
Description
β’
60.0000 hand written digit images.
β’
28
Γ
28 gray scale pixels.
β’
Normalized to lie in [0, 1
]
Hinton, G. E,
Osindero
, S., and
Teh
, Y. W. (2006).
βA
fast learning algorithm for
deep belief
netsβ.
Neural Computation
, 18:1527

1554
Yoshua
Bengio
,
βLearning
Deep Architectures for
AIβ,
Foundations and Trends
in Machine Learning, 2(1),
pp.1

127, 2009
.
http://
www.scholarpedia.org
/article/Deep_belief_networks#Deep_Belief_N
ets_as_Compositions_of_Simple_Learning_Modules
http://
deeplearning.net
/tutorial/
DBN.html
Jos
Β΄
e
Miguel
Hern
Β΄
andez
Lobato
and Daniel
Hern
Β΄
andez
Lobato
, βThe
New
Generation of Neural
Networksβ ,
Universidad
Aut
Β΄
onoma
de Madrid,
Computer
Science
Department
,
May 5, 2008
Sael
Lee,
Rongjing
Xiang,
Suleyman
Cetintas
,
Youhan
Fang, βDeep
Belief
Netsβ,
Department of Computer Science, Purdue
University
Comments 0
Log in to post a comment