Deep Learning
Supervised Learning
•
Works well if we have right features
•
Domains like computer vision, audio
processing, and natural language
processing requires feature engineering.
•
Feature Engineering is tough job
•
Manually finding right features does not
scale well
What?
•
Learn better features.
–
That are sparse
–
Effective
How?
•
Motivated by small part of brain neocortex
•
In all mammals, it is involved in "higher
functions" such as
sensory perception
,
generation of
motor commands
,
spatial
reasoning
,
conscious thought
and
language
.
Big Picture
pixels
edges
object parts
(combination
of edges)
object models
Neural Network
where
is called the
activation function
.
Multi

layer NN forward pass
Back propagation
Batch update rule for given layer and cumulated over all training samples
Update rule for weights and biases for given layer and given training sample
Objective function
Auto

encoders and Sparsity
•
Back propagation for Unsupervised
learning
•
limit number of hidden nodes.
It is trivial, what we can achieve
with
•
Learn an approximation to the identity
function.
•
if we impose a
sparsity
constraint
on the hidden units
be the average activation of hidden unit
(averaged over the training set).
Auto

encoder and Sparsity
enforce the constraint
where
is a
sparsity parameter
, typically a small value close to zero
(say
)
This can be done by adding one more term in objective function
Now the objective function becomes
What is learned by auto

encoder?
•
We will try to find
what image activates
most a particular
hidden node?
•
To achieve this for a
particular
ith
hidden
node, we construct
image by setting
jth
pixel by
Learning of auto

encoder
x
4
x
5
x
6
+1
Layer 1
Layer 2
x
1
x
2
x
3
x
4
x
5
x
6
x
1
x
2
x
3
+1
Layer 3
Autoencoder
.
Network is trained to
output the input (learn
identify function).
Trivial solution unless:

Constrain number of
units in Layer 2 (learn
compressed
representation), or

Constrain Layer 2 to
be
sparse
.
a
1
a
2
a
3
x
4
x
5
x
6
+1
Layer 1
Layer 2
x
1
x
2
x
3
x
4
x
5
x
6
x
1
x
2
x
3
+1
Layer 3
a
1
a
2
a
3
x
4
x
5
x
6
+1
Layer 1
Layer 2
x
1
x
2
x
3
+1
a
1
a
2
a
3
New representation for input.
x
4
x
5
x
6
+1
Layer 1
Layer 2
x
1
x
2
x
3
+1
a
1
a
2
a
3
x
4
x
5
x
6
+1
x
1
x
2
x
3
+1
a
1
a
2
a
3
+1
b
1
b
2
b
3
Train parameters so that ,
subject to b
i
’s being sparse.
x
4
x
5
x
6
+1
x
1
x
2
x
3
+1
a
1
a
2
a
3
+1
b
1
b
2
b
3
Train parameters so that ,
subject to b
i
’s being sparse.
x
4
x
5
x
6
+1
x
1
x
2
x
3
+1
a
1
a
2
a
3
+1
b
1
b
2
b
3
Train parameters so that ,
subject to b
i
’s being sparse.
x
4
x
5
x
6
+1
x
1
x
2
x
3
+1
a
1
a
2
a
3
+1
b
1
b
2
b
3
New representation for input.
x
4
x
5
x
6
+1
x
1
x
2
x
3
+1
a
1
a
2
a
3
+1
b
1
b
2
b
3
x
4
x
5
x
6
+1
x
1
x
2
x
3
+1
a
1
a
2
a
3
+1
b
1
b
2
b
3
+1
c
1
c
2
c
3
x
4
x
5
x
6
+1
x
1
x
2
x
3
+1
a
1
a
2
a
3
+1
b
1
b
2
b
3
+1
c
1
c
2
c
3
New representation
for input.
Use [c
1
, c
3
, c
3
] as representation to feed to learning algorithm.
References
•
http://ufldl.stanford.edu/wiki
Comments 0
Log in to post a comment