Boltzmann Machine.PPT

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 25 μέρες)

157 εμφανίσεις

BOLTZMANN MACHINE

M.H
Shakoor

DEFINITION

Terry

and

Geoffrey Hinton

invented by

stochastic recurrent neural network

is a type of

Boltzmann machine

A
.
Sejnowski


Hopfield nets

counterpart of

generative

,
stochastic

Boltzmann machines can be seen as the
?
?
capable of learning internal representations, and are able to represent and (given sufficient time) solve difficult
combinatoric

problems.
?
?
useful for practical problems in machine
have not proven
connectivity

unconstrained
Boltzmann machines with
learning or inference
?
?
nature of their training algorithm, as well as their parallelism and the resemblance

Hebbian
and
due to the locality
of their dynamics to simple physical processes.
?
?
If the connectivity is constrained, the learning can be made efficient enough to be useful for practical problems.
?
?
in statistical mechanics, which is used in their

Boltzmann distribution

They are named after the
?
sampling function
?

.
local minima

problems, since the random fluctuations help it escape from

optimization

This makes them useful tools for


.
Boltzmann machine

Stochastic neural networks that are built by using stochastic transfer functions are often called
?
It is a type of unsupervised neural network




BOLTZMANN MACHINES

Discrete

Hopfield NN

Simulated

Annealing

+

Boltzmann

Machine

Simulated annealing (SA)

Simulated annealing (SA)

is a generic
probabilistic

metaheuristic

for the
global optimization

problem of locating a good approximation to the
global
optimum

of a given
function

in a large
search space
. It is often used when
the search space is discrete.


For certain problems, simulated annealing may be more efficient when the
goal is merely to find an acceptably good solution in a fixed amount of
time
, rather than the best possible solution


The name and inspiration come from
annealing

in
metallurgy
, a technique
involving
heating and controlled cooling of a material to increase the size
of its
crystals

and reduce their
defects
. The heat causes the
atoms

to
become unstuck from their initial positions (a
local minimum

of the
internal
energy
) and wander randomly through states of higher energy; the slow
cooling gives them more chances of finding configurations with lower
internal energy than the initial one.



PROCESS

SIMULATED ANNEALING

If we start running the network from a high temperature, and gradually decrease
it until we reach a

thermal equilibrium

at a low temperature, we are guaranteed
to converge to a distribution where the energy level fluctuates around the global
minimum. This process is called

simulated annealing
.


if we want to train the network so that the chance it will converge to a global
state is according to an external distribution that we have over these states, we
need to set the weights so that the global states with the highest probabilities will
get the lowest energies. This is done by the following training procedure

BOLTZMANN MACHINES


Recurrent Networks


Input and Output


Stochastic Neurons


Boltzmann Machine


Optimization


Learning

WEIGHTS


ENERGIES


PROBABILITIES


Each possible joint
configuration of the visible
and hidden units
has an energy


The energy is determined by the weights and biases
(as in a Hopfield net).

The energy of a joint configuration of the visible
and hidden units determines its probability:



The probability
of a configuration over the visible
units is found by summing the probabilities of
all the joint configurations that contain it.

STRUCTURE

A

Boltzmann

machine,

like

a

Hopfield

network
,

is

a

network

of

units

with

an

"energy“

defined

for

the

network
.

It

also

has

binary

units,

but

unlike

Hopfield

nets,

Boltzmann

machine

units

are

stochastic
.


The

global

energy,

E

,

in

a

Boltzmann

machine

is

identical

in

form

to

that

of

a

Hopfield

network
:


STOCHASTIC UNITS

Replace the binary threshold units by binary stochastic units
that make biased random decisions.

The temperature controls the amount of noise.

Decreasing all the energy gaps between configurations is
equivalent to raising the noise level.




temperature

Unipolar neuron



Discrete Hopfield NN






Boltzmann Machine

UPDATE RULES

STOCHASTIC BINARY NEURONS

These have a state of
1
or
0
which is a stochastic function of
the neuron’s bias, b, and the input it receives from other
neurons.

0.5

0

0

1

Unipolar neuron



Discrete Hopfield NN






Boltzmann Machine

UPDATE RULES

T
=
0

T
=
1

T
=
2

T
=
3

T
=


Cooling schedule is required.

Simulated Annealing


Simulated cooling Gradual reduction
of the temperature



Bayes

(belief) networks



directed acyclic graph



feedforward

networks


• Markov random fields (networks)



undirected graph



symmetric networks

TYPES OF STOCHASTIC NETWORKS


BOLTZMANN DISTRIBUTION

BOLTZMANN DISTRIBUTION


The

Boltzmann distribution

(also called the

Gibbs Distribution
) is a
certain

distribution function

or

probability measure

for the distribution of the states
of a system

THE (THEORETICAL) BATCH LEARNING
ALGORITHM

Positive phase

Clamp a data vector on the visible units.

Let the hidden units reach thermal equilibrium at a
temperature of
1

Sample for all pairs of units

Repeat for all data vectors in the training set.

Negative phase
?
Do not clamp any of the units

Let the whole network reach thermal equilibrium at a
temperature of
1

Sample for all pairs of units

Repeat many times to get good estimates


Weight updates

Update each weight by an amount proportional to the
difference in in the two phases.


LEARNING FOR BOLTZMANN MACHINES


Adjust the weights so that the internally produced activity


distribution resembles the external one

FOUR REASONS WHY LEARNING IS IMPRACTICAL

IN BOLTZMANN MACHINES

reach

take a long time to
If there are many hidden layers, it can
vector is clamped on the
-
when a data
thermal equilibrium
visible units.

in the “negative”
longer to reach thermal equilibrium
It takes even
phase when the visible units are unclamped.

to be highly
unconstrained energy surface needs
The
multimodal to model the data.

The learning signal is the difference of two sampled correlations
which is very noisy.

are required.
Many weight updates

RESTRICTED BOLTZMANN MACHINES


Restrict the connectivity to make
learning
easier
.

Only
one layer of hidden
units.

Deal with more layers later

No connections between hidden units
.

In an RBM, the hidden units are conditionally
independent given the visible
states.

So can quickly get an unbiased sample
from the posterior distribution when
given a data
-
vector.


hidden

i

j

visible

RBM: RESTRICTED BOLTZMANN MACHINE


If the connectivity is constrained, the learning can be made
efficient enough to be useful for practical problems.

The annealing schedule

DEEP BOLTZMANN MACHINES

Deep Boltzmann machines are interesting for several reasons.
First
,
like deep belief networks
, DBM’s have the potential of
learning

internal representations that become increasingly
complex
, which is considered to be a promising way of solving
object and
speech

recognition problems.
Second
,
high
-
level
representations can be built from a large supply of unlabeled

sensory inputs
and
very limited labeled data
can then be

used to only slightly fine
-
tune the model for a specific task

at hand.
Finally
,
unlike deep belief networks
, the approximate

inference procedure, in addition to an initial bottom up

pass, can incorporate top
-
down feedback, allowing deep

Boltzmann machines
to better propagate uncertainty
about,

and hence deal more robustly with,
ambiguous inputs.



Higher
-
order Boltzmann machines

Conditional Boltzmann machines

Mean field Boltzmann machines

Non
-
binary units


DIFFERENT TYPES OF BOLTZMANN MACHINE

References



Ackley,

D., Hinton, G., and Sejnowski, T. (
1985
). A Learning Algorithm for Boltzmann Machines.
Cognitive
Science
,
9
(
1
):
147
-
169
.


Geman, S. and Geman, D. (
1984
). Stochastic relaxation, Gibbs distributions, and the
Bayesian

restoration of
images.
IEEE Transactions on Pattern Analysis and Machine
Intelligence
,
6
(
6
):
721
-
741
.


Hinton, G. E. (
2002
). Training products of experts by minimizing contrastive divergence.
Neural

Computation
,
14
(
8
):
1711
-
1800
.


Hinton, G. E, Osindero, S., and Teh, Y. W. (
2006
). A fast learning algorithm for
deep belief nets
.
Neural
Computation
,
18:1527
-
1554
.


Hinton, G. E. and Salakhutdinov, R. R. (
2006
). Reducing the dimensionality of data with neural networks.
Science
,
313:504
-
507
.


Hinton, G. E. and Sejnowski, T. J. (
1983
). Optimal Perceptual Inference.
Proceedings of the IEEE conference on
Computer Vision

and
Pattern Recognition
, Washington DC, pp.
448
-
453
.


Jordan, M. I. (
1998
) Learning in Graphical Models, MIT press, Cambridge Mass
.


Lafferty, J. and McCallum, A. and Pereira, F. (
2001
) Conditional random fields: Probabilistic models for
segmenting and labeling sequence data.
Proc.
18
th International Conf. on
Machine Learning
, pages
282
-
289

Morgan Kaufmann, San Francisco, CA



Peterson, C. and Anderson, J.R. (
1987
), A mean field theory learning algorithm for neural networks.
Complex
Systems
,
1
(
5
):
995
--
1019
.


Sejnowski, T. J. (
1986
). Higher
-
order Boltzmann machines.
AIP Conference Proceedings
,
151
(
1
):
398
-
403
.


Smolensky, P. (
1986
). Information processing in dynamical systems: Foundations of harmony theory. In
Rumelhart, D. E. and McClelland, J. L., editors,
Parallel Distributed Processing
: Volume
1
:
Foundations
, pages
194
-
281
. MIT Press, Cambridge, MA
.


Welling, M., Rosen
-
Zvi, M., and Hinton, G. E. (
2005
). Exponential family harmoniums with an application to
information retrieval.
Advances in Neural Information Processing Systems
17
, pages
1481
-
1488
. MIT Press,
Cambridge, MA
.