Artificial
Neural Networks 2
Morten Nielsen
Depertment
of Systems
Biology
,
DTU
Outline
•
Optimization procedures
–
Gradient decent (this you already know)
•
Network training
–
back propagation
–
cross

validation
–
Over

fitting
–
examples
Neural network. Error estimate
I
1
I
2
w
1
w
2
Linear function
o
Neural networks
Gradient decent (from wekipedia)
Gradient descent is based on the observation that
if the real

valued function F(x) is defined and
differentiable in a neighborhood of a point
a
, then
F(x) decreases fastest if one goes from
a
in the
direction of the negative gradient of F at
a
.
It follows that, if
for
> 0 a small enough number,
then F(b)<F(a)
Gradient decent (example)
Gradient decent (example)
Gradient decent. Example
Weights are changed in the opposite direction of the
gradient of the error
I
1
I
2
w
1
w
2
Linear function
o
What about the hidden layer?
Hidden to output layer
Hidden to output layer
Hidden to output layer
Input to hidden layer
Input to hidden layer
Input to hidden layer
Summary
Or
Or
I
i
=X[0][k]
H
j
=X[1][j]
O
i
=X[2][i]
Can you do it your self?
v
22
=1
v
12
=1
v
11
=1
v
21
=

1
w
1
=

1
w
2
=1
h
2
H
2
h
1
H
1
o
O
I
1
=1
I
2
=1
What is the output (O) from the network?
What are the
w
ij
and
v
jk
values if the
target value is 0 and
=0.5?
Can you do it your self (
=0.5).
Has the error decreased?
v
22
=1
v
12
=1
v
11
=1
v
21
=

1
w
1
=

1
w
2
=1
h
2
=
H
2
=
h
1=
H
1
=
o=
O=
I
1
=1
I
2
=1
v
22
=
.
v
12
=
V
11
=
v
21
=
w
1
=
w
2
=
h
2
=
H
2
=
h
1
=
H
1
=
o=
O=
I
1
=1
I
2
=1
Before
After
Sequence encoding
•
Change in weight is linearly dependent on
input value
•
“
True
”
sparse encoding is therefore
highly inefficient
•
Sparse is most often encoded as
–
+1/

1 or 0.9/0.05
Training and error reduction
Training and error reduction
Training and error reduction
Size matters
•
A Network contains a very large
set of parameters
–
A network with 5 hidden
neurons predicting binding for
9meric peptides has more than
9x20x5=900 weights
•
Over fitting is a problem
•
Stop training when test
performance is optimal
Neural network training
years
Temperature
What is going on?
years
Temperature
Examples
Train on 500 A0201 and 60 A0101 binding data
Evaluate on 1266 A0201 peptides
NH=1: PCC = 0.77
NH=5: PCC = 0.72
Neural network training. Cross validation
Cross validation
Train on 4/5 of data
Test on 1/5
=>
Produce 5 different
neural networks each
with a different
prediction focus
Neural network training curve
Maximum test set performance
Most cable of generalizing
5 fold training
Which network to choose
?
5 fold training
How many folds?
•
Cross validation is always good!, but how
many folds?
–
Few folds

> small training data sets
–
Many folds

> small test data sets
•
Example from Tuesdays exercise
–
560 peptides for training
•
50 fold (10 peptides per test set, few data to stop
training)
•
2 fold (280 peptides per test set, few data to
train)
•
5 fold (110 peptide per test set, 450 per training
set)
Problems with 5fold cross validation
•
Use test set to stop training, and test set
performance to evaluate training
–
Over

fitting?
•
If test set is small yes
•
If test set is large no
•
Confirm using
“
true
”
5 fold cross
validation
–
1/5 for evaluation
–
4/5 for 4 fold cross

validation
Conventional 5 fold cross validation
“
True
”
5 fold cross validation
When to be careful
•
When data is scarce, the performance
obtained used
“
conventional
”
versus
“
true
”
cross validation can be very large
•
When data is abundant the difference is
small, and
“
true
”
cross validation might
even be higher than
“
conventional
”
cross
validation due to the ensemble aspect of
the
“
true
”
cross validation approach
Do hidden neurons matter?
•
The environment
matters
NetMHCpan
Context matters
•
FMIDWILDA YFAMYGE
KV
AHT
HVD
TLY
VR
YH
Y
YTWA
V
L
A
Y
TW
Y 0.89 A0201
•
FMIDWILDA YFAMYQE
NM
AHT
DAN
TLY
II
YR
D
YTWV
A
R
V
Y
RG
Y 0.08 A0101
•
DSDGSFFLY YFAMYGE
KV
AHT
HVD
TLY
VR
YH
Y
YTWA
V
L
A
Y
TW
Y 0.08 A0201
•
DSDGSFFLY YFAMYQE
NM
AHT
DAN
TLY
II
YR
D
YTWV
A
R
V
Y
RG
Y 0.85 A0101
Example
Summary
•
Gradient decent is used to determine the
updates for the synapses in the neural network
•
Some relatively simple math defines the
gradients
–
Networks without hidden layers can be solved on the
back of an envelope (SMM exercise)
–
Hidden layers are a bit more complex, but still ok
•
Always train networks using a test set to stop
training
–
Be careful when reporting predictive performance
•
Use
“
true
”
cross

validation for small data sets
•
And hidden neurons do matter (sometimes)
And some more stuff for the long
cold winter nights
•
Can it might be made differently?
Predicting accuracy
•
Can it be made differently?
Reliability
•
Identification of position specific
receptor ligand interactions by use of
artificial neural network decomposition.
An investigation of interactions in the
MHC:peptide
system
Master these by
Frederik
Otzen
Bagger
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Comments 0
Log in to post a comment