Artificial Neural Networks 2

appliancepartAI and Robotics

Oct 19, 2013 (4 years and 22 days ago)

66 views

Artificial

Neural Networks 2


Morten Nielsen

Depertment

of Systems
Biology
,

DTU

Outline


Optimization procedures


Gradient decent (this you already know)


Network training


back propagation


cross
-
validation


Over
-
fitting


examples

Neural network. Error estimate

I
1

I
2

w
1

w
2

Linear function

o

Neural networks

Gradient decent (from wekipedia)

Gradient descent is based on the observation that
if the real
-
valued function F(x) is defined and
differentiable in a neighborhood of a point
a
, then
F(x) decreases fastest if one goes from
a

in the
direction of the negative gradient of F at
a
.

It follows that, if




for


> 0 a small enough number,

then F(b)<F(a)

Gradient decent (example)

Gradient decent (example)

Gradient decent. Example

Weights are changed in the opposite direction of the
gradient of the error

I
1

I
2

w
1

w
2

Linear function

o

What about the hidden layer?

Hidden to output layer

Hidden to output layer

Hidden to output layer

Input to hidden layer

Input to hidden layer

Input to hidden layer

Summary

Or

Or

I
i
=X[0][k]

H
j
=X[1][j]

O
i
=X[2][i]

Can you do it your self?

v
22
=1

v
12
=1

v
11
=1

v
21
=
-
1

w
1
=
-
1

w
2
=1

h
2

H
2

h
1

H
1

o

O

I
1
=1

I
2
=1

What is the output (O) from the network?

What are the

w
ij

and

v
jk

values if the
target value is 0 and

=0.5?

Can you do it your self (

=0.5).

Has the error decreased?

v
22
=1

v
12
=1

v
11
=1

v
21
=
-
1

w
1
=
-
1

w
2
=1

h
2
=

H
2
=

h
1=

H
1
=

o=

O=

I
1
=1

I
2
=1

v
22
=
.

v
12
=

V
11
=

v
21
=

w
1
=

w
2
=

h
2
=

H
2
=

h
1
=

H
1
=

o=

O=

I
1
=1

I
2
=1

Before

After

Sequence encoding


Change in weight is linearly dependent on
input value



True


sparse encoding is therefore
highly inefficient


Sparse is most often encoded as


+1/
-
1 or 0.9/0.05

Training and error reduction



Training and error reduction



Training and error reduction



Size matters



A Network contains a very large
set of parameters


A network with 5 hidden
neurons predicting binding for
9meric peptides has more than
9x20x5=900 weights



Over fitting is a problem



Stop training when test
performance is optimal


Neural network training

years

Temperature

What is going on?

years

Temperature





Examples

Train on 500 A0201 and 60 A0101 binding data

Evaluate on 1266 A0201 peptides

NH=1: PCC = 0.77

NH=5: PCC = 0.72


Neural network training. Cross validation

Cross validation


Train on 4/5 of data

Test on 1/5

=>

Produce 5 different
neural networks each
with a different
prediction focus


Neural network training curve

Maximum test set performance

Most cable of generalizing

5 fold training

Which network to choose
?

5 fold training

How many folds?


Cross validation is always good!, but how
many folds?


Few folds
-
> small training data sets


Many folds
-
> small test data sets


Example from Tuesdays exercise


560 peptides for training


50 fold (10 peptides per test set, few data to stop
training)


2 fold (280 peptides per test set, few data to
train)


5 fold (110 peptide per test set, 450 per training
set)

Problems with 5fold cross validation


Use test set to stop training, and test set
performance to evaluate training


Over
-
fitting?


If test set is small yes


If test set is large no


Confirm using

true


5 fold cross
validation


1/5 for evaluation


4/5 for 4 fold cross
-
validation

Conventional 5 fold cross validation


True


5 fold cross validation

When to be careful


When data is scarce, the performance
obtained used

conventional


versus

true


cross validation can be very large


When data is abundant the difference is
small, and

true


cross validation might
even be higher than

conventional


cross
validation due to the ensemble aspect of
the

true


cross validation approach

Do hidden neurons matter?



The environment
matters

NetMHCpan

Context matters


FMIDWILDA YFAMYGE
KV
AHT
HVD
TLY
VR
YH
Y
YTWA
V
L
A
Y
TW
Y 0.89 A0201


FMIDWILDA YFAMYQE
NM
AHT
DAN
TLY
II
YR
D
YTWV
A
R
V
Y
RG
Y 0.08 A0101


DSDGSFFLY YFAMYGE
KV
AHT
HVD
TLY
VR
YH
Y
YTWA
V
L
A
Y
TW
Y 0.08 A0201


DSDGSFFLY YFAMYQE
NM
AHT
DAN
TLY
II
YR
D
YTWV
A
R
V
Y
RG
Y 0.85 A0101

Example


Summary


Gradient decent is used to determine the
updates for the synapses in the neural network


Some relatively simple math defines the
gradients


Networks without hidden layers can be solved on the
back of an envelope (SMM exercise)


Hidden layers are a bit more complex, but still ok


Always train networks using a test set to stop
training


Be careful when reporting predictive performance


Use

true


cross
-
validation for small data sets


And hidden neurons do matter (sometimes)

And some more stuff for the long
cold winter nights


Can it might be made differently?

Predicting accuracy


Can it be made differently?

Reliability


Identification of position specific
receptor ligand interactions by use of
artificial neural network decomposition.
An investigation of interactions in the
MHC:peptide

system


Master these by
Frederik

Otzen

Bagger

Making sense of ANN weights

Making sense of ANN weights

Making sense of ANN weights

Making sense of ANN weights

Making sense of ANN weights

Making sense of ANN weights

Making sense of ANN weights