Lecture 8

1
DSCI 4520/5240
DATA MINING
Some slide material taken from: SAS Education
DSCI 4520/5240 Lecture 8
Neural Networks
DSCI 4520/5240 (DATA MINING)
Lecture 8

2
DSCI 4520/5240
DATA MINING
Objectives
Understand the structure of a Neural Network
Understand how the objective function leads to
overgeneralization
Understand how overgeneralization is counteracted by
taking into account the validation data
Lecture 8

3
DSCI 4520/5240
DATA MINING
Neural network models
(
multi

layer perceptrons
)
Often regarded as a mysterious and powerful predictive
modeling technique.
The most typical form of the model is, in fact, a natural
extension of a regression model:
A generalized linear model on a set of derived inputs
These derived inputs are themselves a generalized linear model on
the original inputs
The usual link for the derived input’s model is
inverse
hyperbolic tangent
, a shift and rescaling of the logit
function
Ability to approximate virtually any continuous
association between the inputs and the target
You simply need to specify the correct number of derived inputs
Lecture 8

4
DSCI 4520/5240
DATA MINING
( )
p
1

p
log
w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3
=
Neural Network Model
Training Data
x
2
x
1
tanh

1
(
H
1
) =
w
10
+
w
11
x
1
+
w
12
x
2
tanh

1
(
H
2
) =
w
20
+
w
21
x
1
+
w
22
x
2
tanh

1
(
H
3
) =
w
30
+
w
31
x
1
+
w
32
x
2
tanh(
x
)
x
0

1
1
Lecture 8

5
DSCI 4520/5240
DATA MINING
Input
layer,
hidden
layer,
output
layer
Multi

layer perceptron models were
originally inspired by
neurophysiology and the
interconnections between neurons.
The basic model form arranges
neurons in layers.
The
input layer
connects to a layer of
neurons called a
hidden
layer
, which,
in turn, connects to a final layer
called the
target
, or
output
, layer.
The structure of a multi

layer
perceptron lends itself to a graphical
representation called a
network
diagram
.
H
2
H
1
H
3
x
1
x
2
p
Lecture 8

6
DSCI 4520/5240
DATA MINING
Neural Network Diagram
Hidden Layers
Output Layer
Input
Layer
Hidden Unit
Lecture 8

7
DSCI 4520/5240
DATA MINING
NNs as a Universal Approximator
6+A

2B+3C
A
B
C
Lecture 8

8
DSCI 4520/5240
DATA MINING
An example
INPUT
HIDDEN
OUTPUT
ß
1
+
ß
2
AGE+
ß
3
INC
COMBINATION
ACTIVATION
tanh
(ß
1
+
ß
2
AGE+
ß
3
INC)
=A
ß
4
+
ß
5
AGE+
ß
6
INC
tanh
(ß
4
+
ß
5
AGE+
ß
6
INC)
=B
ß
7
+
ß
8
AGE+
ß
9
INC
tanh
(ß
7
+
ß
8
AGE+
ß
9
INC)
=C
COMBINATION
COMBINATION
ACTIVATION
COMBINATION
ACTIVATION
AGE
INCOME
ß
10
+
ß
11
A+
ß
12
B+
ß
13
C
RESPONSE TO
PROMOTION
Lecture 8

9
DSCI 4520/5240
DATA MINING
Objective Function
Predictions are compared to the actual values of the target via
an objective function.
An easy

to

understand example of an objective function is the
mean squared error (MSE) given by
:
Where:
N
is the number of training cases.
y
i
is the target value of the ith case.
is the predicted target value.
is the current estimate of the model parameters.
Lecture 8

10
DSCI 4520/5240
DATA MINING
( )
p
1

p
log
w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3
=
Neural Network Training
Training Data
x
2
x
1
tanh

1
(
H
1
) =
w
10
+
w
11
x
1
+
w
12
x
2
tanh

1
(
H
2
) =
w
20
+
w
21
x
1
+
w
22
x
2
tanh

1
(
H
3
) =
w
30
+
w
31
x
1
+
w
32
x
2
0
10
20
30
40
50
60
70
Objective function (
w
)
Lecture 8

11
DSCI 4520/5240
DATA MINING
Neural Network Training
Parameter 2
Parameter 1
Lecture 8

12
DSCI 4520/5240
DATA MINING
Convergence
Training concludes when small changes in the parameter
values no longer decrease the value of the objective
function.
The network is said to have reached a local minimum in the
objective.
Lecture 8

13
DSCI 4520/5240
DATA MINING
( )
p
1

p
log
w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3
=
Neural Network Training Convergence
Training Data
x
2
x
1
tanh

1
(
H
1
) =
w
10
+
w
11
x
1
+
w
12
x
2
tanh

1
(
H
2
) =
w
20
+
w
21
x
1
+
w
22
x
2
tanh

1
(
H
3
) =
w
30
+
w
31
x
1
+
w
32
x
2
0
10
20
30
40
50
60
70
Objective function (
w
)
Lecture 8

14
DSCI 4520/5240
DATA MINING
Overgeneralization
A small value for the objective function, when calculated on
training data, need not imply a small value for the
function on validation data.
Typically, improvement on the objective function is
observed on both the training and the validation data
over the first few iterations of the training process.
At convergence, however, the model is likely to be highly
overgeneralized
and the values of the objective function
computed on training and validation data may be quite
different.
Lecture 8

15
DSCI 4520/5240
DATA MINING
( )
p
1

p
log
w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3
=
Training Overgeneralization
Training Data
x
2
x
1
tanh

1
(
H
1
) =
w
10
+
w
11
x
1
+
w
12
x
2
tanh

1
(
H
2
) =
w
20
+
w
21
x
1
+
w
22
x
2
tanh

1
(
H
3
) =
w
30
+
w
31
x
1
+
w
32
x
2
0
10
20
30
40
50
60
70
Objective function (
w
)
Validation
Training
Lecture 8

16
DSCI 4520/5240
DATA MINING
Final Model
To compensate for overgeneralization, the overall average
profit, computed on
validation
data, is examined.
The final parameter estimates for the model are taken from
the training iteration with the
maximum validation profit
.
Lecture 8

17
DSCI 4520/5240
DATA MINING
( )
p
1

p
log
w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3
=
Neural Network Final Model
Training Data
x
2
x
1
tanh

1
(
H
1
) =
w
10
+
w
11
x
1
+
w
12
x
2
tanh

1
(
H
2
) =
w
20
+
w
21
x
1
+
w
22
x
2
tanh

1
(
H
3
) =
w
30
+
w
31
x
1
+
w
32
x
2
0
10
20
30
40
50
60
70
Profit
Comments 0
Log in to post a comment