Masoud Saeed Shiraz University

agreeablesocietyΤεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 4 χρόνια και 9 μέρες)

119 εμφανίσεις

Masoud

Saeed

Shiraz University

Neural Networks Course

1391


Theoretical results suggest that in order to learn
the kind of
complicated functions
that can
represent high
-
level
abstractions

one may need
Deep
architectures.

Deep architectures are composed of
multiple
levels

of
non
-
linear operations
, such as in neural
nets with
many
hidden layers

Searching the
parameter space of
deep
architectures
is a
difficult
task

learning
algorithms such
as those
for
Deep
Belief Networks

have recently been proposed to
tackle this problem with notable
success




We do not have
enough
formalized prior knowledge
about
the world to explain
the observed
variety of images, even for such
an apparently simple abstraction
as
MAN

A high
-
level abstraction such as
MAN
has the property that it
corresponds to a
very large set of
possible images
,
which might be
very different from each other

Many
lower level

and
intermediate
level concepts
(abstractions)
would
be useful
to
construct a
MAN
-
Detector

Lower level
abstractions (more
abstract
)
are
more directly
tied to
particular percepts



We do not know exactly
how to build
robust
MAN
detectors or even
intermediate abstractions
that
would
be
appropriate

The
number of visual and semantic categories (such as
MAN
) that we
would like
an
intelligent machine
to
capture is large

The focus of
Deep
architecture learning
is to
automatically
discover

such abstractions,
from the
lowest

level features to the
highest

level
concepts

Ideally, we would
like learning
algorithms that enable this
discovery with as little
human effort
as
possible i.e. :

without having
to manually
define all necessary
abstractions

without
having
to provide a huge set of relevant hand
-
labeled examples


If these algorithms could tap into the huge
resource of text and images on the web, it
would certainly help
to transfer
much of
human
knowledge

into
machine
-
interpretable form
.

functions
learned should have
a
structure

composed of
multiple levels
, analogous to the
multiple levels of
abstraction

that
humans

naturally envision
when they
describe an aspect

of their world
. The arguments
rest both on
:

intuition

theoretical results
about the representational
limitations
of
functions defined with an insufficient
number of levels


Family
of more recently proposed
learning
algorithms

that have been very successful to
train

deep architectures
:

Deep Belief Networks
(
DBNs
)
(
Hinton et al. (
2006)

Stacked
Autoassociators

(
Bengio

et al, 2007;
Ranzato

et al,
2007
).

DBNs

are based on
:

Restricted
Boltzmann Machines (
RBMs
)

Contrastive
Divergence
Algorithm
(Hinton, 2002),



[
Hinton06
]
showed
that
RBMs

can be
stacked

and
trained

in a greedy manner to form so
-
called Deep
Belief Networks (
DBN
).

They model the joint distribution between
observed vector and the hidden layers as

follows:




where
x
=

0

,
𝑃



1



)


is a
conditional
distribution
for the
visible units
conditioned on the
hidden units
of the
RBM

at level k, and
𝑃
(




1
,


)

is
the
visible
-
hidden joint distribution
in the top
-
level
RBM
.



A deep generative model can be obtained by
stacking

RBMs
.
Each layer
of
RBMs

models

more
and more
abstract features
.

Each time an
RBM

is added to the
stack



A
DBN

can be viewed as a

composition
of
simple
learning

modules


Each
of which is
a
restricted


type
of
Boltzmann machine
that contains a layer

of
visible units

that represent the
data
and a layer
of
hidden units

that
learn

to represent
features
that capture higher
-
order correlations in the data.

The
two layers are connected by a matrix of
symmetrically
weighted
connections
W
, and there
are no connections within a layer.


Deep Belief Networks
(
DBNs
) are

Deep
neural networks,
trained

in a greedy layer
-
wise
fashion.

probabilistic
generative models
that are composed of
multiple layers of
stochastic, latent variables
.

Each layer of the network tries to
model

the
distribution of its input, using
unsupervised
training
in a
Restricted Boltzmann
Machine
(
RBMs
)

The
unsupervised

greedy layer
-
wise training
serves
as
initialization
, replacing the traditional
random initialization of multi
-
layer networks
.

The
latent variables
typically have
binary values
and are often called

hidden units
or
feature
detectors
.



Learned
one layer
at a
time

When
the values of the
latent variables
in one layer

inferred from
data, Treating them
as the
data for training
the next layer
.


The top two layers
have
undirected, symmetric
connections between
them and form an
associative memory.

The
lower layers
receive
top
-
down,
directed
connections
from the
layer above. The states of
the units in the lowest
layer represent a data
vector.


Given
a vector of
activities

v

for the
visible units
,
the

hidden units
are all
conditionally independent
so it is easy
to sample a
vector h

from the
hidden
vectors
,
𝑃

𝑣

,
𝑊
)

.

It
is
easy
to sample
from
𝑃
𝑣


,
𝑊
)

.
By
starting

with an
observed data vector
on the
visible units
and alternating several times between sampling
from
𝑃

𝑣

,
𝑊
)

and
𝑃
𝑣


,
𝑊
)
, it is easy to get
a
learning signal
.

This
learning
signal is simply the
difference

between the
pairwise correlations of the visible
and hidden units
at the
beginning

and
end

of the
sampling

(
see Boltzmann machine for details
).



1.
Learn
an
RBM

and put it on the top.

2.
Filter
the data through the current deep architecture.

3.
Learn
an
RBM

using the filtered data and put it on the
top.

4.
Filter
the data through the current deep architecture.

5.
Repeat
3 and 4 until
n

RBMs

have been stacked
.


To filter the data through the deep architecture we just propagate

expectations up in the
RBMs
, conditioning to the original data
.


We sample from the deep architecture by sampling from the top

RBM

and then sampling the remaining
RBMs

down.


The principle of greedy
layer
-
wise unsupervised
training can be applied
to
DBNs

with
RBMs

as
the building blocks for
each
layer.
The process
is as follows
:

1.
Train

the
first layer
as an
RBM

that models the
raw input

as its
visible layer
.

2.
Use

that
first layer
to obtain a
representation

of the
input that will be used as
data

for the
second layer.

( samples
𝑃

1

0
)

or mean
activations
𝑃

1
=
1

0
)

)


3)
Train

the
second layer
as an
RBM
, taking the
transformed data

(samples or mean activations)
as
training examples
(for the
visible layer
of that
RBM
).

4)
Iterate

(2 and 3) for the
desired number of layers
,
each time propagating upward either samples or
mean values.

5)
Fine
-
tune

all the
parameters

of this
deep
architecture



𝒉

𝒉

)



Posterior
of the first
RBM



𝒉

𝒙
)




Probability
of
the same Layer



After the greedy
algorithm,
the recognition and generative
weights
can
be fine
-
tuned by means of a
Wake
-
Sleep

method

or
a
back
-
propagation
technique
.


Wake
-
Sleep

The
original data
is
propagated up
, the top
RBM

iterates a
few
times
and then a
sample

is
propagated down
. The
weigths

are
updated

so that the sample of the
DBN

matches the original data
.


Back
-
Propagation

The
network is
unfold
to produce encoder and decoder
networks.
Stochastic
activities are replaced by deterministic probabilities
and
the
weights are updated by back
-
propagation for
optimal
reconstruction
.


A
DBN
(Hinton et al., 2006)

with


layers models the joint

distribution
between
observed

vector
x

and


hidden layers




as
follows
:




where
x
=

0

,
𝑃



1



)

is a
conditional
distribution
for visible hidden units in an
RBM

associated
with level


of the
DBN

𝑃
(



1
,


)

is the visible
-
hidden
joint distribution
in the top
-
level
RBM










High level feature
extraction

Non
-
linear dimensionality reduction:

MNIST


Olivetti
faces.

Digits Recognition

MNIST

example from Hinton’s web
.

Generating and recognizing images

Video sequences

Motion
-
capture data

Very fast retrieval of documents or images



Description



400
faces in 24
×

24
bitmap

images
.


Gray scale images.


Pixel
intesities

(0 − 255)

are
normalized to lie
in

the
[0, 1
] interval
.


5 transformations

increase
the set size to

1600
images.


Description



60.0000 hand written digit images.


28
×

28 gray scale pixels.


Normalized to lie in [0, 1
]


Hinton, G. E,
Osindero
, S., and
Teh
, Y. W. (2006).
“A
fast learning algorithm for
deep belief
nets”.

Neural Computation
, 18:1527
-
1554

Yoshua

Bengio
,
“Learning
Deep Architectures for
AI”,
Foundations and Trends
in Machine Learning, 2(1),
pp.1
-
127, 2009
.

http://
www.scholarpedia.org
/article/Deep_belief_networks#Deep_Belief_N
ets_as_Compositions_of_Simple_Learning_Modules

http://
deeplearning.net
/tutorial/
DBN.html

Jos
´
e

Miguel
Hern
´
andez

Lobato

and Daniel
Hern
´
andez

Lobato
, “The
New
Generation of Neural
Networks” ,
Universidad
Aut
´
onoma

de Madrid,
Computer

Science

Department
,
May 5, 2008

Sael

Lee,
Rongjing

Xiang,
Suleyman

Cetintas
,
Youhan

Fang, “Deep
Belief
Nets”,
Department of Computer Science, Purdue
University