Deep Learningx

beeuppityAI and Robotics

Oct 19, 2013 (3 years and 8 months ago)

106 views

Deep Learning

Supervised Learning


Works well if we have right features


Domains like computer vision, audio
processing, and natural language
processing requires feature engineering.


Feature Engineering is tough job


Manually finding right features does not
scale well

What?


Learn better features.


That are sparse


Effective

How?


Motivated by small part of brain neocortex



In all mammals, it is involved in "higher
functions" such as

sensory perception
,
generation of

motor commands
,

spatial
reasoning
,

conscious thought

and
language
.

Big Picture

pixels

edges

object parts

(combination

of edges)

object models

Neural Network




where



is called the

activation function
.

Multi
-
layer NN forward pass

Back propagation

Batch update rule for given layer and cumulated over all training samples

Update rule for weights and biases for given layer and given training sample

Objective function

Auto
-
encoders and Sparsity


Back propagation for Unsupervised
learning



limit number of hidden nodes.

It is trivial, what we can achieve

with


Learn an approximation to the identity
function.


if we impose a

sparsity

constraint
on the hidden units

be the average activation of hidden unit



(averaged over the training set).

Auto
-
encoder and Sparsity

enforce the constraint




where



is a

sparsity parameter
, typically a small value close to zero


(say



)


This can be done by adding one more term in objective function


Now the objective function becomes

What is learned by auto
-
encoder?


We will try to find
what image activates
most a particular
hidden node?


To achieve this for a
particular
ith

hidden
node, we construct
image by setting
jth

pixel by


Learning of auto
-
encoder

x
4

x
5

x
6

+1

Layer 1

Layer 2

x
1

x
2

x
3

x
4

x
5

x
6

x
1

x
2

x
3

+1

Layer 3

Autoencoder
.


Network is trained to
output the input (learn
identify function).




Trivial solution unless:

-

Constrain number of
units in Layer 2 (learn
compressed
representation), or

-

Constrain Layer 2 to
be
sparse
.

a
1

a
2

a
3

x
4

x
5

x
6

+1

Layer 1

Layer 2

x
1

x
2

x
3

x
4

x
5

x
6

x
1

x
2

x
3

+1

Layer 3

a
1

a
2

a
3

x
4

x
5

x
6

+1

Layer 1

Layer 2

x
1

x
2

x
3

+1

a
1

a
2

a
3

New representation for input.

x
4

x
5

x
6

+1

Layer 1

Layer 2

x
1

x
2

x
3

+1

a
1

a
2

a
3

x
4

x
5

x
6

+1

x
1

x
2

x
3

+1

a
1

a
2

a
3

+1

b
1

b
2

b
3

Train parameters so that ,


subject to b
i
’s being sparse.

x
4

x
5

x
6

+1

x
1

x
2

x
3

+1

a
1

a
2

a
3

+1

b
1

b
2

b
3

Train parameters so that ,


subject to b
i
’s being sparse.

x
4

x
5

x
6

+1

x
1

x
2

x
3

+1

a
1

a
2

a
3

+1

b
1

b
2

b
3

Train parameters so that ,


subject to b
i
’s being sparse.

x
4

x
5

x
6

+1

x
1

x
2

x
3

+1

a
1

a
2

a
3

+1

b
1

b
2

b
3

New representation for input.

x
4

x
5

x
6

+1

x
1

x
2

x
3

+1

a
1

a
2

a
3

+1

b
1

b
2

b
3

x
4

x
5

x
6

+1

x
1

x
2

x
3

+1

a
1

a
2

a
3

+1

b
1

b
2

b
3

+1

c
1

c
2

c
3

x
4

x
5

x
6

+1

x
1

x
2

x
3

+1

a
1

a
2

a
3

+1

b
1

b
2

b
3

+1

c
1

c
2

c
3

New representation

for input.

Use [c
1
, c
3
, c
3
] as representation to feed to learning algorithm.

References


http://ufldl.stanford.edu/wiki